Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
2 views

Chapter 6. Limited dependent variable models FINAL

Chapter 6 discusses Limited Dependent Variable Models, focusing on the Linear Probability Model, Probit, and Logit models for analyzing binary outcomes. It explains the limitations of using OLS for binary data and presents alternative methods for estimation, including maximum likelihood estimation. The chapter also compares the Probit and Logit models, highlighting their similarities and differences in application and interpretation.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Chapter 6. Limited dependent variable models FINAL

Chapter 6 discusses Limited Dependent Variable Models, focusing on the Linear Probability Model, Probit, and Logit models for analyzing binary outcomes. It explains the limitations of using OLS for binary data and presents alternative methods for estimation, including maximum likelihood estimation. The chapter also compares the Probit and Logit models, highlighting their similarities and differences in application and interpretation.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Chapter 6: Limited Dependent Variable Models

6.1. The Linear Probability Model

It is among discrete choice models or dichotomous choice models. In this


case the dependent variable takes only two values: 0 and 1. There are
several methods to analyze regression models where the dependent
variable is 0 or 1. The simplest method is to use the least squares method.
In this case the model is called linear probability model.

The other method is where there is an underlying or latent variable


which we do not observe.

6.1

This is the idea behind Logit and Probit models

In this case the variable is an indicator variable that denotes the


occurrence and non occurrence of an event. For instance in the analysis of
the determinants of unemployment, we have data on each person that
shows whether or not the person is employed and we have some
explanatory variables that determine employment.

In regression form that is written as:

6.2

Where, and the conditional expectation ,


which is the probability that the event will occur given .

Since takes only two values, 0 and 1, the regression in the above
equation can take only two values,

and

The variance of , 6.3

1
Using OLS would result in heteroskedasticity problem.

This problem can be overcome by using the following two step


estimation procedure.

1. Estimate using OLS 6.4

2. Compute and use weighted least squares, i.e,

6.5

Then, regress

However, the problem with this procedure (the least squares or weighted
least squares) is:

1. may be negative
2. are not normally distributed and there is problem with the
application of the usual tests of significance.
3. The conditional expectation be interpreted as the probability
that the event will occur. In many cases can lie outside the limits
.

6.2. The Probit and Logit Models

An alternative approach is to assume the following regression model

6.6

2
Where is not observed. It is commonly called a latent variable. What

we observe is a dummy variable defined by:

6.7

The Probit and Logit models differ in the specification of the distribution
of the error term

For instance, if the observed dummy variable is whether or not a person is


employed or not, would be defined as ‘propensity or ability to find
employment.’

Thus,

6.8

Where, F is the cumulative density function of

a) The Probit Model:

The cumulative standard density is given:


2
t
1  Z2
p Y 1   e dt  Z 
 2 6.9

Where, Z  0  1 X1   2 X 2  ...   k X k

b)The Logit Model:

The cumulative logistic function for Logit model is based on the concept
of an odds ratio.

Let the log odds that Y 1 be given by:


 p 
ln    0  1 X 1   2 X 2  ...   k X k
 1 p  6.10
Solving for the probability that Y 1 we will get:

3
p
e Z
1 p 6.11
 P 1  P  e e  pe
Z Z Z

 p  pe Z e Z
 p 1  e Z  e Z
eZ 1 1 1
 p  Z  z 
e 1  e  e 1 1  e
Z z
1 e Z
6.12
 Z 
The above logistic probability is simply denoted as .

Both Probit and Logit distributions are ‘S’ shaped, but differ in the
relative thickness of the tails. Logit is relatively thicker than Probit. This
difference would, however, disappear, as the sample size gets large.

The relationship between   can be represented as a ‘latent’


Z & p Y 1
underlying index that determines choices. The latent index function, Z, is
determined in linear fashion by a set of independent variables X. In turn,
the latent index Z determines P(Y=1).

The Bernoulli trail of Probit and Logit model conditional on Z is given


by:
1 Y
f Y / Z  PY 1  P 
6.13
Plugging either the standard normal cumulative density function (for
Probit) or the cumulative logistic function (for Logit) into the above
function to have the appropriate probability function gives:

1 Yi 
f Yi / Z   Z  i 1   
Y

6.14

for Probit model

f Yi / Z   Z  i 1   Z 
Y 1 Yi

6.15

for Logit model

The likelihood function for these models is given by:


n
1 Yi 
L   k / Yi , X i     z  i 1   
Y

i 1 6.16
for Probit Model

4
n 1 Yi 
L   k / Yi , X i     z  i 1    z 
Y

i 1 6.17
for Logit Model

The Log Likelihood function of these models is give as:


n
ln L   k / Yi , X i    Yi ln   z   1  Yi ln 1   z 
i 1 6.18
for Probit
n
ln L   k / Yi , X i    Yi ln   z   1  Yi ln 1   z 
and i 1 6.19
for Logit

These functions can be optimized using standard methods to get the


parameter values.

In choosing between Probit and Logit models, there is no statistical


theory for preferring one to the other. Thus, it makes no difference which
one to choose. The two models are quite similar in large samples. But in
small samples the two models differ significantly.

However, choice between the two models can be made on convenience. It


is much easier to compute Probit probabilities (table of z statistic). Logit
is simpler mathematically.

The probability model in the form of a regression is:

E Y / X  0  1  F   ' X   1  F   ' X 


6.20
F   ' X 
Whatever distribution is used, the parameters of the model like those of
any other nonlinear regression model, are not necessarily the marginal
effects:
E Y / X   dF   ' X  
 
X  d   ' X  
6.21
 f  ' X  
f .
Where, is the density function that corresponds to the cumulative
F ..
density distribution,

a) For the normal distribution, this is:

5
E  Y / X 
   ' X  
X
6.22
where   is the standard normal density.
 .

b) For logistic distribution


d   ' X  e ' X
    ' X   1     ' X 
d   ' X  1  e  ' X 2
6.23
E Y / X 
X    ' X   1     ' X  
= 6.24

In interpreting the estimated model, in most cases the means of the


regressors are used. In other instances, pertinent values are used based on
the choice of the researcher.

For an independent variable, say k, that is binary the marginal effect can
be computed as:
prob  Y 1/ X * , K 1  prob  Y 1/ X * , K 0 
6.25
Where, X * denotes the mean of all other variables in the model.

Therefore, the marginal effects can be evaluated at the sample means of


the data. Or the marginal effects can be evaluated at every observation
and the average can be computed to represent the marginal effects.

More generally, the marginal effects are give as

6.3. Estimation of Binary Choice Models

The log likelihood function for the two models is:



log L  yi log F   ' X   1  yi log 1  F   ' X  
6.26
The first order condition with respect to the parameters of the model is be
given by:

6
 log L y f  fi 
  i i  1  yi   X i 0
  Fi 1  Fi   6.27
dFi
d  ' X 
Where fi is the density , here i indicates that the function has an
argument  ' X i .
i) For a normal distribution (Probit), the log likelihood is
log L   log  1     ' X i    log    ' X i 
yi 0 yi 1
6.28
 log L  i 
 Xi   i Xi
 yi 0 1   i yi 1  i
6.29

ii) For a Logit model, the log likelihood is:


n
ln L   k / Yi , X i    Yi ln   z   1  Yi ln 1   z 
i 1

6.30
 log L
  yi  i X i 0


6.4. Measures of Goodness of fit

When the independent variable to be measure is dichotomous, there is a


problem of using the conventional as a measure of goodness of fit.

1. Measures based on likelihood ratios

Let be the maximum likelihood function when maximized with


respect to all the parameters and be the maximum when maximized
with restrictions .

6.31

Cragg and Uhler (1970) suggested a pseudo that lies between 0 and 1.

7
6.32

McFadden (1974) defined as

6.33

can also be linked in terms of the proportion of correct predictions.


After computing , we can classify the observation as belonging to
group 1 if and group 2 if . We can then count the
number of correct predictions.

6.34

Count 6.35

Example: Regression results of a Probit model of house ownership and


income is given below

6.36

We want to measure the effect of a unit change in income on the


probability of owning a house

6.37

8
Where, is the standard normal probability density function
evaluated at .

At value of , the normal density function at


that is equal to

= 0.3066 6.38

Now multiplying this value by the slop coefficient of income, we get


0.01485.

Logit model of owning a house

6.39

This means for a unit increase in weighted income the weighted log of the
odds in favour of owning a house goes up by 0.08 units.

Converting into odds ratio, we take the antilog

6.40

6.5. Maximum Likelihood Estimation

For Linear Regression Model, the MLE of a normal variable


conditional on with mean and variance , the pdf for an
observation is:

6.41

The pdf of a normal variable with mean and is often expressed in


terms of the pdf standardized normal variable with mean 0 and
variance of 1.

9
6.42

Thus, 6.43

The Likelihood can be written as

6.44

6.6. Limited Dependent Variables

The density function of a normally distributed variable with mean and


variance of is given by:

6.45
Where y N ¿ )

For a standard normal distribution,

6.46

The density of a standard normal variable is

6.47

The cumulative density function of a normal distribution is

10
6.48

Due to symmetry,

In limited variable models we may encounter some form of truncation.

If has density the distribution of truncated from below at a


given c is given by:

if and 0 otherwise 6.49

If y is a standard normal variable, the truncated distribution of has


the probability:

6.50

If the distribution is truncated from above

If has a normal distribution with mean and variance , the


truncated distribution has mean

6.51

Where,

And

6.6.1. Tobit (Censored Regression) Model

In certain applications, the dependent variable is continuous, but its range


may be constrained. Most commonly this occurs when the dependent

11
variable is zero for a substantial part of the population but positive for the
rest of the population.

6.52

Where,

In this model all negative values are mapped to zeros. i.e. observations
are censored (from below) at zero.

The model describes two things:

1. The possibility that given

6.53

2. The distribution of given that it is positive.

This is truncated normal distribution with expectation

6.54

The last term shows the conditional expectation of a mean zero normal
variable given that it is no larger than The conditional expectation of

no longer equals but depends non linearly on through .

Marginal effects of the Tobit Model

1. The probability of a zero outcome is:

12
6.55

2. The expected value of (positive values) is

6.56

Thus the marginal effect on the expected value of of a change in is


given by

6.57
This means the marginal effect of a change in upon the expected
outcome is given by the model’s coefficient multiplied by the
possibility of having a positive outcome.

3. The marginal effect up on the latent variable is

Maximum Likelihood Estimation of the Tobit Model

The contribution of an observation either equals the probability mass (at


the observed point ) or the conditional density of , given that it is
positive times the probability mass of observing

6.58

Using appropriate expression for normal distribution we can obtain:

6.59

13
Maximizing this function with respect to the parameters will give the
maximum likelihood estimates.

6.6.2. Sample Selection

Tobit model imposes a structure that is often restrictive: exactly the same
variables affecting the probability of nonzero observation determine the
level of positive observation and more over with the same sign.

This implies, for example, that those who are more likely to spend a
positive amount are, on average, also those that spend more on durable
goods.

For example, we might be interested in explaining wages. Obviously


wages are observed for people that are actually working, but we might be
interested in (potential) wages not conditional on this selection.

For example, a change in some variable may lower someone’s wage


such that he decides to stop working. Consequently, his wage would not
be observed and the effect of this variable could be underestimated from
the available data.

Because, a sample of workers may not be a random sample of the


population (of potential workers), one may expect that people with lower
(potential) wages are more likely to be unemployed - This problem is
often referred to as sample selection.

Consider the following sample selection model of wage:

6.60

Where, denotes vector of exogenous characteristics of the person,


denotes the persons wage.

The wage is not observable for people that are not working.

Thus to describe whether a person is working or not, a second equation is


specified, which is binary choice type:

14
6.61

Where,

6.62

The binary variable indicates working or not working. The error terms
of the two equations have mean of zero with variances of ,
respectively and covariance of .

One usually sets the restriction, for normalization restriction of the


Probit model. The conditional expected wage given that a person is
working is given by:

6.63

The conditional expected wage equals only if . So if the


error terms of the two equations are not correlated the wage equation can
be consistently estimated by OLS.

A sample selection bias of OLS arises if

15
The term is known as the inverse Mill’s ratio and is denoted by

by Heckman (1979) and is referred as Heckman’s model.

16

You might also like