3.Handouts_binary_dependent_variables
3.Handouts_binary_dependent_variables
Up to now, the dependent variable, 𝑦𝑖 , has been ccontinuous. However, in economics, there are
several situations in which the dependent variable is qualitative or only partially continuous. In all
these cases the linear regression model is inappropriate. Here we will focus on the models where
the dependent variable 𝑦𝑖 is a dummy variable:
𝐸(𝑦𝑖 |𝑥𝑖 ) = 𝑥𝑖 ′𝛽
Since 𝑦𝑖 is distributed as a Bernoulli:
We know it has not consequence on the unbiased and consistency properties (we correctly
specified the conditional mean of yi ), but now the 𝛽̂𝑜𝑙𝑠 is not distributed according a
Normal. This implies that the statistical tests do not have the well-known distributions (eg.
t-student for testing the hypotheses on a single parameter), which are derived from the
hypothesis of normality of the errors in small samples. Statistical test distributions will
instead be based on the asymptotic distribution of 𝛽̂𝑜𝑙𝑠 .
• 𝜺𝒊 is heteroskedastic
Since εi is a Bernoulli variable:
The error variance depends on x. This doesn’t effect on the unbiased of 𝛽̂𝑜𝑙𝑠 , but
a) 𝑉(𝛽̂𝑂𝐿𝑆 ) ≠ 𝜎 2 (𝑋 ′ 𝑋)−1
The main consequence is that the standard errors provided by the software to which
we ask to estimate the model with OLS are based on the wrong formula:
𝑠. 𝑒. (𝛽̂𝑗 ) = 𝑠 ∙ √𝑐𝑗𝑗
Where 𝑐𝑗𝑗 is the element in the position (j,j) in the matrix (𝑋 ′ 𝑋)−1 and
𝑠 = √∑ 𝜖̂𝑖 /(𝑁 − 𝐾). It follows that the tests on the coefficient significance are not
reliable.
To solve the heteroskedasticity problem we can use the Minimums Weighted Squares. The
model with OLS is estimated first, thus obtaining a consistent estimator of 𝛽, 𝛽̂𝑂𝐿𝑆 ; then the
transformed model, obtained dividing for √𝑥𝑖 ′𝛽̂𝑂𝐿𝑆 (1 − 𝑥𝑖′ 𝛽̂𝑂𝐿𝑆 ), is estimated again with
OLS
• ̂𝐨𝐥𝐬 can be outside the interval [0,1]
𝐱 𝐢𝛃
In the linear probability model 𝐸(𝑦𝑖 |𝑥𝑖 ) = Pr(𝑦𝑖 = 1|𝑥𝑖 ), so 0 ≤ 𝐸(𝑦𝑖 |𝑥𝑖 ) ≤ 1: since the
dependent variable is binary, the expected value should be inside the interval [0;1] it is
possible that the estimated 𝑥𝑖 ′𝛽̂𝑂𝐿𝑆 , the Pr(𝑦𝑖 = 1|𝑥𝑖 ) estimation, is bigger than 0 or lower
than 1. The fitted values inside the interval [0;1] can be interpreted as the estimated
probability that the event 𝑦𝑖 = 1 occurs; while fitted values outside the interval [0;1]
cannot be interpreted, since a probability is limited to the interval [0;1].
The last is the real problem related to the linear probability model. The following method ensures
that the estimated probability lies in the interval [0,1].
The best way to introduce this estimation method is to start with an economic problem. For
example, imagine that we have data on a sample of families, and that we are interested in
determining the relevant variables in the choice of home buying. Drawing from our knowledge of
microeconomics, we could conceptualize a model that explains the choice of home buying as the
outcome of a utility maximization process on the individual or family side. This process determines
the maximum willingness to pay for a home (e.g., per square meter) based on their income and
various other family characteristics that reflect their preferences. While income and some family
characteristics are observable, the maximum willingness to pay itself is not directly observable.
Instead, we can observe its natural consequence: if this willingness exceeds a certain threshold
(e.g., the market price per square meter), then the family buys the home; otherwise, they choose
to rent.
Thus, our empirical analysis is based on an economic model that we cannot directly estimate
because the dependent variable differs. For this reason, we refer to the first model (the
theoretical one) as the "latent model," and the second (the estimable model) as the "limited
dependent variable model." In this example, the observed dependent variable is qualitative and
binary (i.e., whether the house is owned or not), and therefore, it can be interpreted as a dummy
variable equal to 1 if the house is owned and 0 if it is not.
Let suppose that the theoretical model is linear:
𝑦𝑖 ∗ = 𝑥𝑖 ′𝛽 + 𝜀𝑖 ∗
Where 𝑦𝑖∗ is the latent variable? It is available the observations on the variables (yi , xi ), i=1,….,N.
yi is the limited dependent variable, related with 𝑦𝑖∗ in this way:
1 𝑖𝑓 𝑦𝑖∗ > 0
𝑦𝑖 = {
0 𝑖𝑓 𝑦𝑖∗ ≤ 0
Our goal is to write a model that ensures us a consistent estimator:
𝑦𝑖 = 𝐸(𝑦𝑖 |𝑥𝑖 ) + 𝜀𝑖
We have to specify correctly 𝐸(yi |xi ) = Pr (yi = 1|xi ).
To accomplish this, we need to specify Pr(yi = 1|xi ), using the latent model:
The proper specification of the conditional mean of yi depends on the error term of the latent
model (𝜀𝑖 ∗ ). We suppose that:
𝐹 (. ) 𝑃𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛
𝜀𝑖 𝑖𝑖𝑑 {
𝑓 (. ) 𝐷𝑒𝑛𝑠𝑖𝑡𝑦 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛
Where f(.) is symmetric around 0:
Pr(𝜀𝑖 ∗ > −𝑥𝑖 ′𝛽 |xi ) = Pr(𝜀𝑖 ∗ < 𝑥𝑖 ′𝛽 |xi ) = 𝐹(𝑥𝑖 ′𝛽)
Therefore the model to estimate is:
yi = 𝐹 (𝑥𝑖′ 𝛽) + 𝜀𝑖
The above model is not linear and it guarantees that the estimated probability is between 0 and 1,
since 0 ≤ 𝐹 (𝑥𝑖′ 𝛽) ≤ 1 by definition of the partition function. As a cconsequence the marginal
effects are anymore constant, varying according to xi . Indeed, the parameters β rapresent the
marginal effects of the latent model.
𝜕𝐸( 𝑦𝑖∗ |𝑥𝑖 )
𝛽𝑗 =
𝜕𝑥𝑖𝑗
The estimated model marginal effects ( yi = 𝐹(𝑥𝑖 ′𝛽) + 𝜀𝑖 ) are different and they also have a
different interpretation: they are the probability variation of an event is realized (↔ 𝑡ℎ𝑎𝑡 𝑦𝑖 = 1)
given a variation equal to 1 in the variable j.
𝜕𝐸 ( 𝑦𝑖 |𝑥𝑖 )
𝑀𝐸𝑗 = =
𝜕𝑥𝑖𝑗
𝜕𝐹 (𝑥𝑖′𝛽)
= =
𝜕𝑥𝑖𝑗
= 𝑓(𝑥𝑖′ 𝛽)𝛽𝑗
Where f is the density function of 𝜀𝑖∗ . These marginal effects are not constant, but they are
function of the independent variables (the marginal effect has a different value for each individual
i as a function of the value of its explicative variables)
Now we can estimate efficiently the model, using the maximum likelihood method. The 𝑦𝑖 ’s
distribution is:
Or
𝑓(𝑦𝑖 |𝑥𝑖 , 𝛽, 𝜎) = [𝐹(𝑥𝑖′ 𝛽)] 𝑦𝑖 [1 − 𝐹(𝑥𝑖′𝛽)]1−𝑦𝑖
We have to maximise:
Once we obtain 𝛽̂, since it is not constant, to have a synthetic indicator of the marginal effects we
can follow two ways::
a) Calculate the marginal effect in the mean of the explanatory variables:
̂ 𝑗 = 𝑓(𝑥̅ ′𝛽̂)𝛽̂𝑗
𝑀𝐸
𝑁
1
̂ 𝑗 = ∑ 𝑓(𝑥𝑖′ 𝛽̂)𝛽̂𝑗
𝑀𝐸
𝑁
𝑖=1
3. Significance and Goodness of Fit in latent variable model
𝐻1 : 𝑡ℎ𝑒𝑟𝑒 𝑖𝑠 𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 𝑎 𝛽𝑗 ≠ 0
• 𝐿𝑜 : the maximum likelihood value linked to a max 𝑙𝑛𝐿 obtained by the “restriceted” model,
where all coefficients, save the constant term, are equal to zero (restricted model)
• 𝐿1 : the maximum likelihood value linked to a max 𝑙𝑛𝐿 where the coefficients are
unresctricted (compleate model).
the test’s statistic is:
2
2[𝑙𝑛𝐿1 − 𝑙𝑛𝐿0 ] ∼ 𝑎𝑠𝑖𝑛𝑡𝑜𝑡𝑖𝑐𝑎𝑙𝑙𝑦 𝜒𝐾−1
Where (K-1) is the number of regressors in the unrestricted model, save the constant (the
restriction’s number). If the test statistic is greater than the critical value, the null is rejected.
Taking in mind what we developed in section 2, we can write the estimating model as:
𝑦𝑖 = Φ(𝑥𝑖′𝛽) + 𝜀𝑖
The marginal effects of the estimated model are:
𝜕𝐸 ( 𝑦𝑖 |𝑥𝑖 ) 𝜕Φ(𝑥𝑖′ 𝛽)
𝑀𝐸𝑗 = = = 𝜙(𝑥𝑖′ 𝛽)𝛽𝑗
𝜕𝑥𝑖𝑗 𝜕𝑥𝑖𝑗
The logistic density function is closed to the normal and symmetric around its mean. We can
write the model as:
𝑦𝑖 = Λ(𝑥𝑖′𝛽) + 𝜀𝑖
′
𝑒 𝑥𝑖 𝛽
𝑦𝑖 = ′ + 𝜀𝑖
1 + 𝑒 𝑥𝑖 𝛽
where the marginal effects are:
′
𝜕𝐸 ( 𝑦𝑖 |𝑥𝑖 ) 𝑒 𝑥𝑖 𝛽
𝑀𝐸𝑗 = = 𝑓 (𝑥𝑖′ 𝛽)𝛽𝑗 = 𝛽
′𝛽 2 𝑗
𝜕𝑥𝑖𝑗 𝑥
(1 + 𝑒 ) 𝑖
Considering the maximum likelihood function, 𝑦𝑖 is a Bernoulli with a distribution equal to: