Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
2 views

3.Handouts_binary_dependent_variables

The document discusses binary dependent variable models in econometrics, focusing on situations where the dependent variable is qualitative or binary, such as decisions to buy a house or attend university. It introduces the Linear Probability Model and the Latent Variable Model, highlighting issues like heteroskedasticity and the limitations of OLS estimators for binary outcomes. The document also explains the significance testing for coefficients in the context of maximum likelihood estimation.

Uploaded by

benassi.giochi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

3.Handouts_binary_dependent_variables

The document discusses binary dependent variable models in econometrics, focusing on situations where the dependent variable is qualitative or binary, such as decisions to buy a house or attend university. It introduces the Linear Probability Model and the Latent Variable Model, highlighting issues like heteroskedasticity and the limitations of OLS estimators for binary outcomes. The document also explains the significance testing for coefficients in the context of maximum likelihood estimation.

Uploaded by

benassi.giochi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

HANDOUTS of ECONOMETRICS

TOPIC [3]: BINARY DEPENDENT VARIABLE MODEL

Up to now, the dependent variable, 𝑦𝑖 , has been ccontinuous. However, in economics, there are
several situations in which the dependent variable is qualitative or only partially continuous. In all
these cases the linear regression model is inappropriate. Here we will focus on the models where
the dependent variable 𝑦𝑖 is a dummy variable:

• Why do some individuals decide to go to university and others do not?


• Why do some individuals decide to buy a house, while others rent it?
• Why do some women decide to work and others do not?
In all these cases the variable we want to explain is binary; for example:
1 𝑖𝑓 𝑡ℎ𝑒 ℎ𝑜𝑢𝑠𝑒 𝑖𝑠 𝑜𝑤𝑛𝑒𝑑
𝑦𝑖 = {
0 𝑖𝑓 𝑡ℎ𝑒 ℎ𝑜𝑢𝑠𝑒 𝑖𝑠 𝑛𝑜𝑡 𝑜𝑤𝑛𝑒𝑑

1. Linear Probability Model


Let's consider a linear model where 𝑦𝑖 is a function of the wage and other characteristics.
𝑦𝑖 = 𝑥𝑖 ′𝛽 + 𝜀𝑖
Under the usually hypothesis 𝐸(𝜀𝑖 |𝑥𝑖 ) = 0, the model is:

𝐸(𝑦𝑖 |𝑥𝑖 ) = 𝑥𝑖 ′𝛽
Since 𝑦𝑖 is distributed as a Bernoulli:

𝐸(𝑦𝑖 |𝑥𝑖 ) = 1 ∗ Pr(𝑦𝑖 = 1|𝑥𝑖 ) + 0 ∗ Pr(𝑦𝑖 = 0|𝑥𝑖 ) = Pr(𝑦𝑖 = 1|𝑥𝑖 )


As consequence, the estimated model explains the probability that a given event is realized (i.e.
the purchase of the house):
𝑦𝑖 = Pr(𝑦𝑖 = 1|𝑥𝑖 ) + 𝜀𝑖

And the probability to realize the event is a linear function:


Pr(𝑦𝑖 = 1|𝑥𝑖 ) = 𝑥𝑖 ′𝛽
For this reason, that model is called “Linear Probability Model”. It would seem that the least
squares method can be applied if the dependent variable is binary; compared to the models
studied so far, the only difference would seem its interpretation in terms of probability. (eg. the
probability to purchase the house). Unfortunately, in this model, the OLS estimator has same
problems:
• 𝜺𝒊 has not a normal distribution
𝜺𝒊 can assume only two values and it is distributed according to a Bernoulli distribution:
indeed εi = yi − xi β and

We know it has not consequence on the unbiased and consistency properties (we correctly
specified the conditional mean of yi ), but now the 𝛽̂𝑜𝑙𝑠 is not distributed according a
Normal. This implies that the statistical tests do not have the well-known distributions (eg.
t-student for testing the hypotheses on a single parameter), which are derived from the
hypothesis of normality of the errors in small samples. Statistical test distributions will
instead be based on the asymptotic distribution of 𝛽̂𝑜𝑙𝑠 .

• 𝜺𝒊 is heteroskedastic
Since εi is a Bernoulli variable:

𝑉( εi |𝑥𝑖 ) = 𝑥𝑖 ′𝛽(1 − 𝑥𝑖′ 𝛽)

The error variance depends on x. This doesn’t effect on the unbiased of 𝛽̂𝑜𝑙𝑠 , but

a) 𝑉(𝛽̂𝑂𝐿𝑆 ) ≠ 𝜎 2 (𝑋 ′ 𝑋)−1

The main consequence is that the standard errors provided by the software to which
we ask to estimate the model with OLS are based on the wrong formula:

𝑠. 𝑒. (𝛽̂𝑗 ) = 𝑠 ∙ √𝑐𝑗𝑗
Where 𝑐𝑗𝑗 is the element in the position (j,j) in the matrix (𝑋 ′ 𝑋)−1 and
𝑠 = √∑ 𝜖̂𝑖 /(𝑁 − 𝐾). It follows that the tests on the coefficient significance are not
reliable.

b) 𝛽̂𝑜𝑙𝑠 is no more BLUE

To solve the heteroskedasticity problem we can use the Minimums Weighted Squares. The
model with OLS is estimated first, thus obtaining a consistent estimator of 𝛽, 𝛽̂𝑂𝐿𝑆 ; then the

transformed model, obtained dividing for √𝑥𝑖 ′𝛽̂𝑂𝐿𝑆 (1 − 𝑥𝑖′ 𝛽̂𝑂𝐿𝑆 ), is estimated again with
OLS
• ̂𝐨𝐥𝐬 can be outside the interval [0,1]
𝐱 𝐢𝛃
In the linear probability model 𝐸(𝑦𝑖 |𝑥𝑖 ) = Pr(𝑦𝑖 = 1|𝑥𝑖 ), so 0 ≤ 𝐸(𝑦𝑖 |𝑥𝑖 ) ≤ 1: since the
dependent variable is binary, the expected value should be inside the interval [0;1] it is
possible that the estimated 𝑥𝑖 ′𝛽̂𝑂𝐿𝑆 , the Pr(𝑦𝑖 = 1|𝑥𝑖 ) estimation, is bigger than 0 or lower
than 1. The fitted values inside the interval [0;1] can be interpreted as the estimated
probability that the event 𝑦𝑖 = 1 occurs; while fitted values outside the interval [0;1]
cannot be interpreted, since a probability is limited to the interval [0;1].
The last is the real problem related to the linear probability model. The following method ensures
that the estimated probability lies in the interval [0,1].

2. Latent variable model

The best way to introduce this estimation method is to start with an economic problem. For
example, imagine that we have data on a sample of families, and that we are interested in
determining the relevant variables in the choice of home buying. Drawing from our knowledge of
microeconomics, we could conceptualize a model that explains the choice of home buying as the
outcome of a utility maximization process on the individual or family side. This process determines
the maximum willingness to pay for a home (e.g., per square meter) based on their income and
various other family characteristics that reflect their preferences. While income and some family
characteristics are observable, the maximum willingness to pay itself is not directly observable.
Instead, we can observe its natural consequence: if this willingness exceeds a certain threshold
(e.g., the market price per square meter), then the family buys the home; otherwise, they choose
to rent.
Thus, our empirical analysis is based on an economic model that we cannot directly estimate
because the dependent variable differs. For this reason, we refer to the first model (the
theoretical one) as the "latent model," and the second (the estimable model) as the "limited
dependent variable model." In this example, the observed dependent variable is qualitative and
binary (i.e., whether the house is owned or not), and therefore, it can be interpreted as a dummy
variable equal to 1 if the house is owned and 0 if it is not.
Let suppose that the theoretical model is linear:
𝑦𝑖 ∗ = 𝑥𝑖 ′𝛽 + 𝜀𝑖 ∗

Where 𝑦𝑖∗ is the latent variable? It is available the observations on the variables (yi , xi ), i=1,….,N.
yi is the limited dependent variable, related with 𝑦𝑖∗ in this way:
1 𝑖𝑓 𝑦𝑖∗ > 0
𝑦𝑖 = {
0 𝑖𝑓 𝑦𝑖∗ ≤ 0
Our goal is to write a model that ensures us a consistent estimator:
𝑦𝑖 = 𝐸(𝑦𝑖 |𝑥𝑖 ) + 𝜀𝑖
We have to specify correctly 𝐸(yi |xi ) = Pr (yi = 1|xi ).

To accomplish this, we need to specify Pr(yi = 1|xi ), using the latent model:

Pr(yi = 1|xi ) = Pr(𝑦𝑖∗ > 0 |xi )


= Pr(𝑥𝑖 ′𝛽 + 𝜀𝑖 ∗ > 0 |xi )
= Pr(𝜀𝑖 ∗ > −𝑥𝑖 ′𝛽 |xi )

The proper specification of the conditional mean of yi depends on the error term of the latent
model (𝜀𝑖 ∗ ). We suppose that:
𝐹 (. ) 𝑃𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛
𝜀𝑖 𝑖𝑖𝑑 {
𝑓 (. ) 𝐷𝑒𝑛𝑠𝑖𝑡𝑦 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛
Where f(.) is symmetric around 0:
Pr(𝜀𝑖 ∗ > −𝑥𝑖 ′𝛽 |xi ) = Pr(𝜀𝑖 ∗ < 𝑥𝑖 ′𝛽 |xi ) = 𝐹(𝑥𝑖 ′𝛽)
Therefore the model to estimate is:
yi = 𝐹 (𝑥𝑖′ 𝛽) + 𝜀𝑖
The above model is not linear and it guarantees that the estimated probability is between 0 and 1,
since 0 ≤ 𝐹 (𝑥𝑖′ 𝛽) ≤ 1 by definition of the partition function. As a cconsequence the marginal
effects are anymore constant, varying according to xi . Indeed, the parameters β rapresent the
marginal effects of the latent model.
𝜕𝐸( 𝑦𝑖∗ |𝑥𝑖 )
𝛽𝑗 =
𝜕𝑥𝑖𝑗

The estimated model marginal effects ( yi = 𝐹(𝑥𝑖 ′𝛽) + 𝜀𝑖 ) are different and they also have a
different interpretation: they are the probability variation of an event is realized (↔ 𝑡ℎ𝑎𝑡 𝑦𝑖 = 1)
given a variation equal to 1 in the variable j.

𝜕𝐸 ( 𝑦𝑖 |𝑥𝑖 )
𝑀𝐸𝑗 = =
𝜕𝑥𝑖𝑗

𝜕𝐹 (𝑥𝑖′𝛽)
= =
𝜕𝑥𝑖𝑗

= 𝑓(𝑥𝑖′ 𝛽)𝛽𝑗
Where f is the density function of 𝜀𝑖∗ . These marginal effects are not constant, but they are
function of the independent variables (the marginal effect has a different value for each individual
i as a function of the value of its explicative variables)
Now we can estimate efficiently the model, using the maximum likelihood method. The 𝑦𝑖 ’s
distribution is:

Or
𝑓(𝑦𝑖 |𝑥𝑖 , 𝛽, 𝜎) = [𝐹(𝑥𝑖′ 𝛽)] 𝑦𝑖 [1 − 𝐹(𝑥𝑖′𝛽)]1−𝑦𝑖

So the logarithm of the maximum likelihood function is:


𝑁

𝑙𝑛𝐿 = ∑{𝑦𝑖 𝑙𝑛𝐹(𝑥𝑖′𝛽) + (1 − 𝑦𝑖 )𝑙𝑛[1 − 𝐹(𝑥𝑖′ 𝛽)]}


𝑖=1

We have to maximise:

𝛽̂ = 𝑎𝑟𝑔 𝑚𝑎𝑥 𝑙𝑛𝐿


𝛽

Where 𝛽̂ solves the following K First Order Conditions:


𝑁
𝜕𝑙𝑛𝐿 𝑓(𝑥𝑖′𝛽) 𝑓(𝑥𝑖′𝛽)
= 0 ⇔ ∑ {𝑦𝑖 𝑥 − (1 − 𝑦 ) 𝑥}=0
𝜕𝛽 𝐹(𝑥𝑖′ 𝛽) 𝑖 𝑖
1 − 𝐹(𝑥𝑖′ 𝛽) 𝑖
𝑖=1

Once we obtain 𝛽̂, since it is not constant, to have a synthetic indicator of the marginal effects we
can follow two ways::
a) Calculate the marginal effect in the mean of the explanatory variables:

̂ 𝑗 = 𝑓(𝑥̅ ′𝛽̂)𝛽̂𝑗
𝑀𝐸

b) Calculate the mean of the margnal effect:

𝑁
1
̂ 𝑗 = ∑ 𝑓(𝑥𝑖′ 𝛽̂)𝛽̂𝑗
𝑀𝐸
𝑁
𝑖=1
3. Significance and Goodness of Fit in latent variable model

3.1 Significance of the coefficients


To evaluate the significance level of the single coefficient, the T-test is used as usual. However,
since the Maximum Likelihood Estimator (MLE) estimator is only asymptotically normal, the t-
statistic is distributed according to N(0,1). Therefore, we have to use the critical value of the
standard normal distribution instead of the Student’s distribution.
3.2 Significance of the regression
To test the regression’s significance we need, as usual, to test the following hypothesis:
𝐻𝑜 : 𝛽𝑗 = 0 𝑓𝑜𝑟 𝑒𝑎𝑐ℎ 𝑗 (𝑠𝑎𝑣𝑒 𝑡ℎ𝑒 𝑐𝑜𝑠𝑡𝑎𝑛𝑡 𝑡𝑒𝑟𝑚)

𝐻1 : 𝑡ℎ𝑒𝑟𝑒 𝑖𝑠 𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 𝑎 𝛽𝑗 ≠ 0

The following values of the maximum likelihood function are used:

• 𝐿𝑜 : the maximum likelihood value linked to a max 𝑙𝑛𝐿 obtained by the “restriceted” model,
where all coefficients, save the constant term, are equal to zero (restricted model)
• 𝐿1 : the maximum likelihood value linked to a max 𝑙𝑛𝐿 where the coefficients are
unresctricted (compleate model).
the test’s statistic is:
2
2[𝑙𝑛𝐿1 − 𝑙𝑛𝐿0 ] ∼ 𝑎𝑠𝑖𝑛𝑡𝑜𝑡𝑖𝑐𝑎𝑙𝑙𝑦 𝜒𝐾−1

Where (K-1) is the number of regressors in the unrestricted model, save the constant (the
restriction’s number). If the test statistic is greater than the critical value, the null is rejected.

3.3 Goodness of fit


In a model with a latent variable, it is not possible to evaluate the goodness of fit with the 𝑅2 .
Therefore alternative measures based on 𝐿1 and 𝐿0 defined as above and, in particular, on the
distance between 𝑙𝑛𝐿1 and 𝑙𝑛𝐿0 : greater is this distance, greater it is the contribution to the
explanation of the dependent variable provided by the complete model respect to the
restricted one. Two are the indicators of goodness of estimate most used:
It is used the distance between the lnL of the restricted and unrestricetd model: two indeces
are develped
1
𝑝𝑠𝑒𝑢𝑑𝑜 𝑅2 = 1 −
1 + 2(𝑙𝑛𝐿1 − 𝑙𝑛𝐿0 )/𝑁
𝑙𝑛𝐿1
𝑀𝑐𝐹𝑎𝑑𝑑𝑒𝑛 𝑅2 = 1 −
𝑙𝑛𝐿0
Since the unrestricted maximum likelihood is a joint probability: 0 ≤ 𝐿 ≤ 1 which implies
𝑙𝑛𝐿 ≤ 0 Therefore, as the unrestricted maximum is always larger than the restricted
maximum: 𝑙𝑛𝐿1 ≥ 𝑙𝑛𝐿0 , as conseguence:

0 ≤ 𝑝𝑠𝑒𝑢𝑑𝑜 𝑅2 ≤ 1 and 0 ≤ 𝑀𝑐𝐹𝑎𝑑𝑑𝑒𝑛 𝑅2 ≤ 1


When all explanatory variable coefficients are equal to zero, the value is zero, in other word
when 𝑙𝑛𝐿1 = 𝑙𝑛𝐿0 : the restricted model likelihood is equal to the unrestricted one, so the
explanatory variables in the model don’t add any explanatory power compared to a model
with only a constant term.

4. PROBIT AND LOGIT: two different applications


Up to now, we have described a general methodology where the density function and partition
function are denoted as 𝑓(∙) and 𝐹(∙). Now, we assume a specific distribution for the error
term in latent model (normal or logistic), defining specific density and partition function.
According to this different assumption, we get the Probit and Logit model.

4.1 PROBIT MODEL


Let suppose the latent model error term is normally distributes:
𝜀𝑖∗ ~𝑖𝑖𝑑 𝑁(0,1)
therefore, the partition function
𝑥′𝛽 1 1 𝑥′𝛽
𝐹 (𝑥𝑖′𝛽) = Φ(𝑥𝑖′𝛽) = ∫−∞
𝑖
exp (− 𝑧 2 ) 𝑑𝑧 = ∫−∞
𝑖
𝜙(𝑧)𝑑𝑧
√2𝜋 2

Taking in mind what we developed in section 2, we can write the estimating model as:
𝑦𝑖 = Φ(𝑥𝑖′𝛽) + 𝜀𝑖
The marginal effects of the estimated model are:

𝜕𝐸 ( 𝑦𝑖 |𝑥𝑖 ) 𝜕Φ(𝑥𝑖′ 𝛽)
𝑀𝐸𝑗 = = = 𝜙(𝑥𝑖′ 𝛽)𝛽𝑗
𝜕𝑥𝑖𝑗 𝜕𝑥𝑖𝑗

Where 𝜙 is the density function of the normal N(0,1).


Also in this case we estimate the model using the maximum likelihood method:

𝑙𝑛𝐿 = ∑{𝑦𝑖 𝑙𝑛Φ(𝑥𝑖′𝛽) + (1 − 𝑦𝑖 )𝑙𝑛[1 − Φ(𝑥𝑖′ 𝛽)]}


𝑖=1

𝛽̂ = 𝑎𝑟𝑔 𝑚𝑎𝑥 𝑙𝑛𝐿


𝛽
In the Probit model, the second order conditions hold (in absence of multicollinearity) ↔ 𝑙𝑛𝐿 is
concave ⇒ there is a maximum and it is always possible to find 𝛽̂, even if the function is not
linear, and it is necessary to implement iterative algorithm.

4.2 LOGIT MODEL


This second model is built up assuming that the error of the theoretical (or latent) model
(𝜀𝑖∗ ) are distributed as a standard logistic. In this situation, the partition and density function
are:

𝑒 𝑥𝑖 𝛽
𝐹 (𝑥𝑖′𝛽) = Λ(𝑥𝑖′ 𝛽) = ′
1+𝑒 𝑥𝑖 𝛽
′ ′
𝑒 𝑥𝑖 𝛽 𝑒 𝑥𝑖 𝛽 1
𝑓(𝑥𝑖′ 𝛽) = ′ 2
= ′ ′ = Λ(𝑥𝑖′ 𝛽)[1 − Λ(𝑥𝑖′𝛽)]
(1+𝑒 𝑥𝑖 𝛽 ) 1+𝑒 𝑥𝑖 𝛽 1+𝑒 𝑥𝑖 𝛽

The logistic density function is closed to the normal and symmetric around its mean. We can
write the model as:
𝑦𝑖 = Λ(𝑥𝑖′𝛽) + 𝜀𝑖

𝑒 𝑥𝑖 𝛽
𝑦𝑖 = ′ + 𝜀𝑖
1 + 𝑒 𝑥𝑖 𝛽
where the marginal effects are:


𝜕𝐸 ( 𝑦𝑖 |𝑥𝑖 ) 𝑒 𝑥𝑖 𝛽
𝑀𝐸𝑗 = = 𝑓 (𝑥𝑖′ 𝛽)𝛽𝑗 = 𝛽
′𝛽 2 𝑗
𝜕𝑥𝑖𝑗 𝑥
(1 + 𝑒 ) 𝑖

Considering the maximum likelihood function, 𝑦𝑖 is a Bernoulli with a distribution equal to:

Indeed the 𝑙𝑛𝐿 is equal to:

𝑙𝑛𝐿 = ∑{𝑦𝑖 𝑙𝑛Λ(𝑥𝑖′𝛽) + (1 − 𝑦𝑖 )𝑙𝑛[1 − Λ(𝑥𝑖′𝛽)]}


𝑖=1

You might also like