Ordered Response Models
Ordered Response Models
Ordered Response Models
Summary: We discuss regression models for ordered responses, such as ratings of bonds,
schooling attainment, or measures of subjective well-being. Commonly used models in
this context are the ordered logit and ordered probit regression models. They are based
on an underlying latent model with single index function and constant thresholds. We
argue that these approaches are overly restrictive and preclude a flexible estimation of
the effect of regressors on the discrete outcome probabilities. For example, the signs of
the marginal probability effects can only change once when moving from the smallest
category to the largest one. We then discuss several alternative models that overcome
these limitations. An application illustrates the benefit of these alternatives.
1. Introduction
Regression models for ordered responses, i.e. statistical models in which the
outcome of an ordered dependent variable is explained by a number of ar-
bitrarily scaled independent variables, have their origin in the biometrics
literature. Aitchison and Silvey (1957) proposed the ordered probit model
to analyze experiments in which the responses of subjects to various doses
of stimulus are divided into ordinally ranked classes. Snell (1964) suggested
the use of the logistic instead of the normal distribution as an approxima-
tion for mathematical simplification. The first comprehensive treatment of
ordered response models in the social sciences appeared with the work of
McKelvey and Zavoina (1975) who generalized the model of Aitchison and
Silvey to more than one independent variable. Their basic idea was to as-
sume the existence of an underlying continuous latent variable – related to
a single index of explanatory variables and an error term – and to obtain
the observed categorical outcome by discretizing the real line into a finite
number of intervals.
McCullagh (1980) developed independently the so-called cumulative model
in the statistics literature. He directly modelled the cumulative probabili-
ties of the ordered outcome as a monotonic increasing transformation of a
linear predictor onto the unit interval, assuming a logit or probit link func-
tion. This specification yields the same probability function as the model
of McKelvey and Zavoina, and is therefore observationally equivalent. Both
papers spurred a large literature on how to model ordered dependent vari-
ables, the former mostly in the social sciences, the latter predominantly in
the medical and biostatistics literature.
On the one hand, a number of parametric generalizations have been pro-
posed. These include alternative link functions, prominent examples being
Received: / Revised:
∗ We are grateful to an anonymous referee for valuable comment.
ORDERED RESPONSE MODELS 1
where κj and β(k×1) denote unknown model parameters, and F can be any
monotonic increasing function mapping the real line onto the unit interval.
Although no further restrictions are imposed a priori on the transformation
F it is standard to replace F by a distribution function, the most commonly
used ones being the standard normal (which yields the ordered probit) and
the logistic distribution (associated with the ordered logit model), and we
assume in what follows that F represents either the standard normal or
logistic distribution. In order to ensure well-defined probabilities, we require
that κj > κj−1 , ∀j, and it is understood that κJ = ∞ such that F (∞) = 1
as well as κ0 = −∞ such that F (−∞) = 0.
Ordered response models are usually motivated by an underlying con-
tinuous but latent process y ? together with a response mechanism of the
form
In order to identify the parameters of the model we have to fix location and
scale of the argument in F , the former by assuming that x does not contain
a constant term, the latter by normalizing the variance of the distribution
function F . Then, equation (2) represents a well-defined probability func-
tion which allows for straightforward application of maximum likelihood
methods for a random sample of size n of pairs (y, x).
ORDERED RESPONSE MODELS 3
The most natural way to interpret ordered response models (and discrete
probability models in general) is to determine how a marginal change in
one regressor changes the distribution of the outcome variable, i.e. all the
outcome probabilities. These marginal probability effects can be calculated
as
∂ Pr[y = j|x] h i
M P Ejl (x) = = f (κj−1 − x0 β) − f (κj − x0 β) βl (3)
∂xl
Three assumptions of the standard model are responsible for its limitations
in analyzing marginal probability effects: First, the single index assumption,
4 STEFAN BOES and RAINER WINKELMANN
second, the constant threshold assumption, and third, the distributional as-
sumption which does not allow for additional individual heterogeneity be-
tween individual realizations. While relaxing these assumptions we want to
retain the possibility of interpreting the model in terms of marginal prob-
ability effects. Therefore, we need to search for a richer class of parametric
models that does not impose restrictions such as constant relative effects or
single crossing. In this section we present four such alternatives.
where it is understood that κ̃0 = −∞ and κ̃J = ∞, as before. The last equal-
ity in (5) follows because γj and β cannot be identified separately with the
same x entering the index function and the generalized thresholds, and we
define βj ≡ β − γj . The cumulative probabilities define a probability den-
sity function in the same manner as in (2) and parameters can be estimated
directly by maximum likelihood. A non-linear specification can be used to
ensure that κ̃j−1 − x0 βj−1 < κ̃j − x0 βj for all κ̃, β̃ and x (e.g. Ronning,
1990). We observe that the generalized threshold model nests the standard
model under the restrictions β1 = . . . = βJ−1 and therefore both models
can be tested against each other by performing a likelihood ratio (LR) test.
The generalized threshold model provides a framework in which marginal
probability effects can be analyzed with much more flexibility than in the
standard model, since
β = β̃ + ε
effects in the random coefficient model are more flexible than those in the
standard model since the sign of the second term is indeterminate.
The random coefficients model can be estimated directly by the method
of maximum likelihood with heteroscedasticity corrected index function.
However, some caution is required in running the optimization routines.
Although the parameters of the model are identified by functional form,
the specific structure of the model might cause problems in some datasets.
Specifically, certain values of Ω, ψ and x can drive σũ2 to be negative or its
square root to be almost linear in the parameters, such that the argument
in F gets complex or is not identified, respectively. Nevertheless, if the data
support the model, we should find reasonable estimates of the elements in
Ω and ψ.
where the πc ’s sum up to unity. The probability density function of the
P
ordered outcome is given by Pr[y = j|x] = c πc Fcj (x) − Fcj−1 (x) and
marginal probability effects can be obtained, as before, by taking the first
order derivative with respect to xl
C
X
M P Ejl (x) = πc fcj−1 (x) − fcj (x) βcl (10)
c=1
n X
X J C
X
ln L(θ, π|y, x, z) = yij ln πc Fcj (xi ) − Fcj−1 (xi )
i=1 j=1 c=1
where θ and π is shorthand notation for the vectors of class specific pa-
rameters θc (which include thresholds and slopes) and probabilities πc , re-
spectively, and yj is a binary variable indicating whether y = j. The multi-
modality of the log-likelihood function and the large number of parameters
for increasing C might cause the optimization routines to be slow in find-
ing the global maximum. Furthermore, although the probability function of
the complete mixture might be well-defined, the probabilities in a subset of
classes can turn negative. An alternative approach of getting the maximum
likelihood estimates that circumvents these problems is to formulate the
model as an incomplete data problem and to apply the EM algorithm of
Dempster et al. (1977).
To be more specific, let mc denote a binary variable indicating individ-
ual class membership which can be interpreted as independent realizations
of a C-component multinomial distribution with component probabilities
πc , the prior probability of belonging to class c. The (complete-data) log-
likelihood function for a random sample of size n conditional on observed
class membership m can be written as
X J
n X C
X n o
ln L(θ, π|y, x, m) = yij mci ln πc + ln Fcj (xi ) − Fcj−1 (xi )
i=1 j=1 c=1
(11)
Since we cannot observe individual class membership, that is the data are
incomplete, we cannot maximize this log-likelihood function directly.
The EM algorithm proceeds iteratively in two steps, based on an E-
step in which the expectation of (11) is taken with respect to m given the
observed data and the current fit of θ and π, and an M-step in which the
log-likelihood function (11) is maximized with respect to θ and π given
expected individual class membership. The linearity of the complete-data
log-likelihood in m allows for direct calculation of the expected individual
class membership given the observed data and the parameters obtained in
the q-th iteration step. This expectation corresponds to the probability of
the ith entity belonging to class c, henceforth called posterior probability τc .
From the assumptions above or simply by Bayes’ theorem it can be shown
that
(q) (q) (q)
πc Fcj (x) − Fcj−1 (x)
τc y, x; θ(q) , π (q) = C (12)
P (q) (q) (q)
πc Fcj (x) − Fcj−1 (x)
c=1
8 STEFAN BOES and RAINER WINKELMANN
(q)
where Fcj denotes the value of F evaluated at the parameters obtained
in the q-th iteration step. These probabilities can be used to anaylze the
characteristics of each class, i.e. we can assign each individual to the class
for which its probability is the highest and then derive descriptive statistics
or marginal probability effects per class.
The M-step replaces mc in (11) by its expectation, τc , and therefore con-
siders the expected log-likelihood to be maximized. Again, the linearity in
(11) provides a substantial simplification of the optimization routine. First,
(q+1)
updated estimates of πc can be obtained directly by taking the sample
−1
P
average n i τ c (.) where 0 ≤ τc (.) ≤ 1 (see (12)). Secondly, each class
can be maximized separately with respect to θc to get updated estimates
(q+1)
θc taking into account the multiplicative factor τc . In other words, we
can estimate C simple ordered probits or logits while weighting the data
appropriately and alter the E- and M-steps repeatedly until the change in
the difference between the log-likelihood values is sufficiently small.
3.4. Sequential Model. The last alternative for a flexible ordered re-
sponse model adopts methods from the literature on discrete time duration
data. In this literature, the main quantity of interest is the conditional exit
probability (or “hazard rate”) Pr[y = j|y ≥ j, x], where y is the duration
of the spell and j is the time of exit. The key insight is that such discrete
time hazard rate models can be used for any ordered response y. Once
the conditional transition probabilities are determined, the unconditional
probabilities are obtained from the recursive relationship
Pr[y = j|x] = Pr[y = j|y ≥ j, x] Pr[y ≥ j|x] j = 1, . . . , J (13)
where
Pr[y ≥ 1|x] = 1
j−1
Yn o
Pr[y ≥ j|x] = 1 − Pr[y = r|y ≥ r, x] j = 2, . . . , J (14)
r=1
Clearly, these effects are very flexible, as they can vary by category and
do not rely on a single crossing property or constant relative effects. The
sequential model and the standard model are nonnested models and one
may use information based measures like the Akaike Information Criterion
(AIC) as a model selection criterion. Moreover, for the problem of choosing
among the generalized alternatives the same strategy is advisable.
4. Empirical Illustration
In order to illustrate the benefit of the generalized ordered response models
we analyze the effect of income on happiness using data from the German
Socio-Economic Panel (GSOEP; see also Boes and Winkelmann, 2004). The
relationship between income and happiness was studied before in a number
of papers (see, for example, Easterlin, 1973, 1974; Scitkovsky, 1975; Frey and
Stutzer, 2000, 2002; Shields and Wheatley Price, 2004 and the references
therein) and has gained renewed interest in the recent literature because
of its use for valuation of public goods or intangibles (see, for example,
Winkelmann and Winkelmann, 1998; Frey, Luechinger, and Stutzer, 2002;
van Praag and Baarsma, 2005).
We used data from the 1997 wave of the GSOEP and selected a sample of
1735 men aged between 25 and 65. The dependent variable happiness with
originally 11 categories was recoded to avoid cells with low frequency and,
after merging the lower categories 0/1/2 and 3/4, we retained a total of J =
8 ordered response categories. We included among the regressors logarithmic
family income and logarithmic household size as well as a quadratic form
in age, and two dummy variables indicating good health status as well as
unemployment.
In our regression analysis, we assumed that F is the cumulative density
function of the standard normal distribution. The random coefficients model
was simplified by restricting Ω and ψ such that σũ2 = Ωll x2l + 2ψl xl + 1,
10 STEFAN BOES and RAINER WINKELMANN
5. Concluding Remarks
In this paper we argued that the standard ordered probit and ordered logit
models, while commonly used in applied work, are characterized by some
restrictive and therefore non-desirable properties. We then discussed four
generalized models, namely the generalized threshold, the random coeffi-
cients, the finite mixture, and the sequential model. All of them are sub-
stantially more flexible in analyzing marginal probability effects since they
do not rely on constant relative effects or a single crossing property.
An illustrative application with data from the 1997 wave of the GSOEP
dealt with the relationship between income and happiness. We asked how a
one-percent increase in income is predicted to change the happiness distribu-
tion, ceteris paribus. The analysis showed that the estimated marginal prob-
ability effects differed markedly between the standard ordered probit model
and the probit-specified alternatives. For example, a negative marginal effect
for the highest answer category (as predicted by the generalized threshold
model) is ruled out by assumption in the standard model.
As is not uncommon with such generalizations, they can be computation-
ally burdensome due to the larger number of parameters, restrictions on the
parameter space, or a multimodality of the likelihood function. Neverthe-
less, the greater flexibility and enhanced interpretation possibilities should
render these alternative models indispensable tools in all research situations,
where an accurate estimation of the marginal probability effects over the
entire range of the outcome distribution is of interest.
References
Aitchison, J., and S.D. Silvey (1957). The Generalization of Probit Analysis
to the Case of Multiple Repsonses. Biometrika 44 131–140.
Agresti, A. (1999). Modelling Ordered Categorical Data: Recent Advances and
Future Challenges. Statistics in Medicine 18 2191–2207.
Anderson, J.A. (1984). Regression and Ordered Categorical Variables. Journal
of the Royal Statistical Society. Series B (Methodological) 46 1–30.
Barnhart, H.X., and A.R. Sampson (1994). Overview of Multinomial Models
for Ordered Data. Communications in Statistics – A. Theory and Methods 23
3395–3416.
ORDERED RESPONSE MODELS 13