Fernando, Logit Tobit Probit March 2011
Fernando, Logit Tobit Probit March 2011
Introduction
In social science research categorical data are often research, collected through surveys.
Categorical g Nominal and Ordinal variables They take only a few values that do NOT have a metric.
A) Binary Case ) y Many dependent variables of interest take only two values (a dichotomous variable), denoting an event or non-event and coded as 1 and 0 respectively. Some examples:
The labor force status of a person. Voting behavior of a person (in favor of a new policy). Whether a person got married or divorced. Whether a person involved in criminal behaviour, etc.
Introduction
With such variables we can build models that variables, describe the response probabilities, say P(yi = 1), of the dependent variable yi. p
For a sample of N independently and identically distributed observations i = 1, ... ,N and a (K+1)-dimensional vector xi of explanatory variables, the probability th t y t k value f l t i bl th b bilit that takes l 1 is modeled as
P ( yi = 1| xi ) = F ( xi ) = F ( zi )
The transformation function F is crucial. It maps the linear combination into [0,1] and satisfies in general F() = 0, F(+) = 1, and F(z)/z > 0 [that is it is a 0 1 is, cumulative distribution function].
1 + e xi
And, when the transformation function F is the cumulative density function (cdf) of the standard normal distribution, the response probabilities are 1 x x s given by 1 2
P ( y i = 1 | x i ) = ( x i ) =
( s ) ds
ds
The Logit and Probit models are almost identical (see the Figure next slide) and the choice of the model is arbitrary, although l i model has certain bi lh h logit d lh i advantages (simplicity and ease of interpretation)
- In the probit model, use the Z-score terminology. For F every unit increase in X, the Z it i i X th Z-score ( th (or the Probit of success) increases by b units. [Or, we can also say that an increase in X changes Z by b standard deviation units.]
- If you like, you can convert the z-score to probabilities y ,y p using the normal table.
Polytomous Case
When the variable is really ordinal, we use cumulative ordinal logits (or probits). The logits in this model are for cumulative categories at each point, contrasting categories above with categories below. Example: Suppose Y has 4 categories; then,
logit (p1) = ln{p1 / (1-p1)} (1 p = a1 + bX logit (p1 + p2) = ln{(p1+ p2 )/(1-p1 p2)} = a2 + bX logit (p1+p2+p3) = ln{(p1+ p2 + p3 )/(1-p1p2p3)} = a3 + bX
Since these are cumulative logits, the probabilities are attached to being in category j and lower. Since the right side changes only in the intercepts, and not in the slope coefficient, this model is known as Proportional odds model. Thus in ordered logistic, we model Thus, logistic need to test the assumption of proportionality as well.
Ordinal Logistic
a1, a2, a3 are the intercepts that satisfy the property intercepts a1 < a2 < a3 interpreted as thresholds of the latent variable. Interpretation of parameter estimates depends on the software used! Check the software manual. If the RHS = a + bX, a positive coefficient is associated bX positi e more with lower order categories and a negative coefficient is associated more with higher order categories. If the RHS = a bX, a negative coefficient is more associated with lower ordered categories and a positive categories, coefficient is more associated with higher ordered categories.
It is also called censored regression model. Censoring can be from below or from above, also called left and right censoring. [Do not confuse the term censoring with the one used in dynamic modeling.]
This is an unusual function, it consists of two terms, the first for non-censored observations (it is the pdf), and th second f censored observations (it i th cdf). d the d for d b ti is the df)
Illustrations for logit, probit and tobit models, using womenwk.dta from Baum available at http://www.stata-press.com/data/imeus/womenwk.dta
Descriptive Statistics N age education married children wagefull wage lw work lwf Valid N (listwise) 2000 2000 2000 2000 2000 1343 1343 2000 2000 1343 Minimum 20 10 0 0 -1.68 5.88 1.77 0 .00 Maximum 59 20 1 5 45.81 45.81 3.82 1 3.82 Mean 36.21 13.08 .67 1.64 21.3118 23.6922 3.1267 .67 2.0996 Std. Deviation 8.287 3.046 .470 1.399 7.01204 6.30537 .28651 .470 1.48752
a. Estimation terminated at iteration number 5 because parameter estimates changed by less than .001.
S.E. .058 .098 .742 .764 .007 .019 .126 .052 .332
df 1 1 1 1 1
-4.159
Binary Probit Regression (in SPSS, use the ordinal regression menu and select probit link function. Ignore the test of parallel lines, etc.)
Model Fitting Information Model -2 Log Likelihood Intercept Only Final Link function: Probit. 1645.024 1166.702 478.322 4 .000 Chi-Square df Sig.
Parameter Estimates 95% Confidence Interval Estimate Threshold Location [work = 0] age education children [married=0] [married=1] Link function: Probit. a. This parameter is set to zero because it is redundant. 2.037 .035 .058 .447 -.431 0a Std. Error .209 .004 .011 .029 .074 . Wald 94.664 67.301 28.061 243.907 33.618 . df 1 1 1 1 1 0 Sig. .000 .000 .000 .000 .000 . Lower Bound 1.626 .026 .037 .391 -.577 . Upper Bound 2.447 .043 .080 .503 -.285 .
Tobit regression cannot be done in SPSS. Use Stata. Here are the Stata commands. First, fit simple OLS Regression of the variable lwf (just to check)
-----------------------------------------------------------------------------lwf | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------age | .0363624 .003862 9.42 0.000 .0287885 .0439362 married | .3188214 .0690834 4.62 0.000 .1833381 .4543046 children | .3305009 .0213143 15.51 0.000 .2887004 .3723015 education | .0843345 .0102295 8.24 0.000 .0642729 .1043961 _cons | -1.077738 .1703218 -6.33 0.000 -1.411765 -.7437105 ------------------------------------------------------------------------------
Tobit regression
= = = =
-----------------------------------------------------------------------------lwf | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------age | .052157 .0057457 9.08 0.000 .0408888 .0634252 married | .4841801 .1035188 4.68 0.000 .2811639 .6871964 children | .4860021 .0317054 15.33 0.000 .4238229 .5481812 education | .1149492 .0150913 7.62 0.000 .0853529 .1445454 _cons | -2.807696 .2632565 -10.67 0.000 -3.323982 -2.291409 -------------+---------------------------------------------------------------/sigma | 1.872811 .040014 1.794337 1.951285 -----------------------------------------------------------------------------Obs. summary: 657 left-censored observations at lwf<=0 1343 uncensored observations 0 right-censored observations