Msfe Week9
There are several situation in which the variable we want to explain can
take only two possible values. This is typically the case when we want to model
the choice of an individual. For example: choice of entering the labor force
of a married woman, 1 if she enters, 0 otherwise; choice of dropping school or
stay in, 1 if the individual drops, zero otherwise. Also, consider for example a
model wishing to investigate the voting decisions of an individual. The latter
will either go to vote (in this case by convention we will say that y = 1) or not
(y = 0). This is why these models are called binary choice models, because they
explain a (0/1) dependent variable.
yi = 1 + 2 x2;i + ::: + 3 xk;i + ui
Note that as yi = 1 or 0; we cannot interpret the estimated in the usual way,
i.e. cannot say that by increasing x2;i of one unit we increase yi of 2 !! As
yi = 1 or 0; then:
E(yi jxi ) = P (yi = 1jxi );
where xi = (x2;i ; :::; xk;i )0 ; and so
You can immediately see that the linear probability model is rather nonsensi-
cal. In fact, for some value of the regressors can predict probability which are
negative or greater than one.
Thus, ui can take only two values, 1 P (yi = 1jxi ) or P (yi = 1jxi );
that is, we have a variance which changes for each observation, and there-
fore heteroscedasticity. Of course, in this case it turns out that the form of
heteroscedasticity is perfectly known a priori, i.e. we know the structure of
hi = (x0i (1 x0i )) and therefore we can apply feasible weighted estimator
using b
hi = x0i b 1 x0i b
. More precisely, we premultiplying both the
dependent variable and the regressors by b h 1=2 : Note that b is the standard
OLS estimator obtained by regressing yi on constant, x2;i ; :::; xk;i :
Consider the choice of joining the labor force or not faced by married women.
Say yi = 1 if i joins and yi = 0 if i does not. Intuitively this choice depends
by variable that can take a continuum of values, like for example the husband
income, or the household saving, or taking di¤erent values like the year of ed-
ucation. Hereafter, let inlf be a binary variable equal to 1 if the woman joins
the labor force and equal to 0 otherwise, hinc; denotes the income of husband
income, edu education, exper experience, klt6 number of kids less than 6 yrs
old, kgt6; number of kids between 6 and 18 years. We estimate the model by
inlf = :586(:154) + :0034(:0014)hinci + :038(:007)educi
+:039(:006) exp eri :0006(:00018) 5 exp eri2
:016(:002)agei :262(:034)klt6i + :13(:013)kgt6i
Models for Binary Choices: Logit and Probit
The linear probability model is characterized by the fact that we model
There are three main issues with the linear probability model: (i) Can predict
probability which are negative or larger than one (ii) A unit change in a regressor
can induce an increase or decrease in probability larger than 1 (iii) a change in
one unit in one regressor has constant e¤ect.
Can use a more ‡exible speci…cation, in which
F (xi ; ) =F (x0i ) ;
yi = x0i +ui
Though, you only observe yi = 1 if yi 0 and yi = 0 if yi < 0; more formally
yi = 1fyi 0g; where 1f:g denotes the indicator function, which is equal to 1
if the events in brackets is true and equal to zero otherwise. The probit model
is the case in which ui is normally distributed, and the logit the case in which
ui has a logistic distribution. With this latent variable interpretation in mind,
What’s the meaning of ? In terms of the latent variable yi ; the coe¢ cient
have the usual meaning...i.e. @yi =@xk;i = k : However, we do not observe yi ;
but we just observe yi if yi 0: What we are indeed interested is the e¤ect of
a change in xk;i on P (yi = 1jxi ): We shall see that the magnitute of is not
particular relevent, though the sign is instead relevant. If the regressor xi;k is
a continuous randon variable (e.g. husband’s income), then the partial e¤ect of
xi;k on P (yi = 1jxi ) is given by
where f is the density associated with F; i.e. in the probit F is the cumulative
distribution function (CDF) of a normal, and f is the normal density. Thus the
The partial e¤ect of a change is xi;k on P (yi = 1jxi ) is given by f (x0i ) k :
(i) The partial e¤ect of a change is xi;k on P (yi = 1jxi ) is NOT constant,
but it is a nonlinear function of xi;k ; in particular the magnitude of k does not
determines the magnitude of the e¤ect.
(ii) As f (x0i ) > 0 for any value of the x; given that it is a density, the sign
of the e¤ect is determined by k :
(iii) For the case in which f is a symmetric density around 0; such as the for
normal, the mode is at zero, so the highest marginal e¤ect occurs when x0i is
close to zero.
(iv) The relative partial e¤ect of xi;k and xj;i on P (yi = 1jxi ) is constant and
f (x0 )
is given by f xi0 k = k :
( i ) j j
density of yjx: As P (y = 1jxi ) = F (x0i ) and P (y = 0jxi ) = 1 F (x0i ) ; we
have that
y 1 y
Li ( ; xi ) = F (x0i ) i (1 F (x0i )) i yi = 0; 1
To make things simpler we take the logs (note that maximizing the likelihood
function of the log likelihood function is absolutely the same).
We are now ready to de…ne the Maximum Likelihood Estimator for a logit or
Probit model;
b = arg max n 1
li ( ; xi )
= arg max n yi log F (x0i ) + (1 yi ) log (1 F (x0i ))
Now, when F is the normal CDF b M L is the probit ML estimator, while when
F is the logistic function b M L is the logit estimator.
Note that we cannot …nd a closed form expression for b M L ; we can simply
obtain it by numerical maximization of the likelihood function.
However, under regularity conditions b M L is consistent for and valid stan-
dard errors can be computed.
The most common approach to test restrictions on in the context of the
probit or logit model is the LIKELIHOOD RATIO TEST.
Basically, we estimate the unrestricted model by ML and so we can com-
pute Lu b M L = n 1 i=1 li;u ( b M L ; xi ) and then we estimate the restricted
model by ML and so we can compute Lr b M L = n 1 i=1 li;r ( b M L ; xi ): The
likelihood ratio test is given by:
LR = 2n Lu b M L Lr b M L