0% found this document useful (0 votes)

152 views

Fernando, Logit Tobit Probit March 2011

The document summarizes common statistical models for categorical and limited dependent variables, including logit, probit, and tobit models. It describes the logit and probit models for binary dependent variables, and how they model the probability of an event using logistic and normal cumulative distribution functions. It also discusses ordinal logit models for polytomous dependent variables and the tobit model for limited or censored dependent variables. Examples and interpretations of coefficients are provided.

Uploaded by

Trieu Giang Bui

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

152 views

Fernando, Logit Tobit Probit March 2011

Uploaded by

Trieu Giang Bui

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Logit, Probit and Tobit: Models for Categorical and Limited Dependent Variables

By Rajulton Fernando Presented at

PLCS/RDC Statistics and Data Series at Western

March 23 2011 23,

Introduction
In social science research categorical data are often research, collected through surveys.
Categorical g Nominal and Ordinal variables They take only a few values that do NOT have a metric.

A) Binary Case ) y Many dependent variables of interest take only two values (a dichotomous variable), denoting an event or non-event and coded as 1 and 0 respectively. Some examples:
The labor force status of a person. Voting behavior of a person (in favor of a new policy). Whether a person got married or divorced. Whether a person involved in criminal behaviour, etc.

Introduction
With such variables we can build models that variables, describe the response probabilities, say P(yi = 1), of the dependent variable yi. p
For a sample of N independently and identically distributed observations i = 1, ... ,N and a (K+1)-dimensional vector xi of explanatory variables, the probability th t y t k value f l t i bl th b bilit that takes l 1 is modeled as

P ( yi = 1| xi ) = F ( xi ) = F ( zi )

where is a (K + 1)-dimensional column vector of parameters.

The transformation function F is crucial. It maps the linear combination into [0,1] and satisfies in general F() = 0, F(+) = 1, and F(z)/z > 0 [that is it is a 0 1 is, cumulative distribution function].

The Logit and Probit Models

When the transformation function F is the logistic function, the response probabilities are given by
P ( y i = 1 | xi ) = e xi

1 + e xi

And, when the transformation function F is the cumulative density function (cdf) of the standard normal distribution, the response probabilities are 1 x x s given by 1 2
P ( y i = 1 | x i ) = ( x i ) =

( s ) ds

The Logit and Probit models are almost identical (see the Figure next slide) and the choice of the model is arbitrary, although l i model has certain bi lh h logit d lh i advantages (simplicity and ease of interpretation)

Source: J.S. Long, 1997

The Logit and Probit Models

However the parameters of the two models are However, scaled differently. The parameter estimates in a logistic regression tend to be 1.6 to 1.8 times higher g g g than they are in a corresponding probit model. p g y The probit and logit models are estimated by maximum likelihood (ML), assuming independence across observations. The ML estimator of is consistent and asymptotically normally distributed. i d i ll ll di ib d However, the estimation rests on the strong assumption that the latent error term is normally distributed and homoscedastic. If homoscedasticity is violated, no easy solution. , y

The Logit and Probit Models

Note: The response function (logistic or probit) is an S-shaped function, which implies a fixed change in X has a smaller impact on the p p probability when it is y near zero than when it is near the middle. Thus, it is a non-linear response function. How to interpret the coefficients : In both models,
If b > 0 If b < 0 p increases as X increases p decreases as X increases
As mentioned above, b cannot be interpreted as a simple slope as in ordinary regression. Because the rate at which the curve ascends or descends changes according to the value of X. In other words, it is not a constant change as in ordinary , g y regression. The greatest rate of change is at p = 0.5

The Logit and Probit Models

In the logit model we can interpret b as an effect model, on the odds. That is, every unit increase in X results in a multiplicative effect of eb on the odds. p
Example: If b = 0.25, then e.25 = 1.28. Thus, when X changes by one unit, p increases by a factor of 1.28, or changes by 28%.

- In the probit model, use the Z-score terminology. For F every unit increase in X, the Z it i i X th Z-score ( th (or the Probit of success) increases by b units. [Or, we can also say that an increase in X changes Z by b standard deviation units.]
- If you like, you can convert the z-score to probabilities y ,y p using the normal table.

Models for Polytomous Data

B) Polytomous Case
Here we need to distinguish between purely nominal variables and really ordinal variables. When the variable is purely nominal, we can extend the dichotomous logit model, using one of g , g the categories as reference and modeling the other responses j=1,2,..m-1 compared to the reference.
Example: In the case of 3 categories, using the 3rd category as the reference, logit p1 = ln(p1/p3) and logit p2 = ln(p2/p3), which will give two sets of parameter estimates. g p
P ( y = 1) = P ( y = 2) = P ( y = 3) = exp( 1 x ) 1 + exp( 1 x ) + exp( 2 x ) exp( 2 x ) 1 + exp( 1 x ) + exp( 2 x ) 1 1 + exp( 1 x ) + exp( 2 x )

Polytomous Case
When the variable is really ordinal, we use cumulative ordinal logits (or probits). The logits in this model are for cumulative categories at each point, contrasting categories above with categories below. Example: Suppose Y has 4 categories; then,
logit (p1) = ln{p1 / (1-p1)} (1 p = a1 + bX logit (p1 + p2) = ln{(p1+ p2 )/(1-p1 p2)} = a2 + bX logit (p1+p2+p3) = ln{(p1+ p2 + p3 )/(1-p1p2p3)} = a3 + bX

Since these are cumulative logits, the probabilities are attached to being in category j and lower. Since the right side changes only in the intercepts, and not in the slope coefficient, this model is known as Proportional odds model. Thus in ordered logistic, we model Thus, logistic need to test the assumption of proportionality as well.

Ordinal Logistic
a1, a2, a3 are the intercepts that satisfy the property intercepts a1 < a2 < a3 interpreted as thresholds of the latent variable. Interpretation of parameter estimates depends on the software used! Check the software manual. If the RHS = a + bX, a positive coefficient is associated bX positi e more with lower order categories and a negative coefficient is associated more with higher order categories. If the RHS = a bX, a negative coefficient is more associated with lower ordered categories and a positive categories, coefficient is more associated with higher ordered categories.

Model for Limited Dependent Variable

C) Tobit Model This model is for metric dependent variable and when it is limited in the sense we observe it only if limited it is above or below some cut off level. For example,
the wages may be limited from below by the minimum g y y wage The donation amount give to charity Top coding income at, say, at $300,000 Time use and leisure activity of individuals Extramarital affairs

It is also called censored regression model. Censoring can be from below or from above, also called left and right censoring. [Do not confuse the term censoring with the one used in dynamic modeling.]

The Tobit Model

The model is called Tobit because it was first proposed by Tobin (1958), and involves aspects of Probit analysis a term coined by Goldberger for Tobins Probit. Reasoning behind:
If we include the censored observations as y = 0, the censored observations on the l f will pull down the end of d b i h left ill ll d h d f the line, resulting in underestimates of the intercept and p overestimates of the slope. If we exclude the censored observations and just use the observations for which y>0 (that is, truncating the sample), it will overestimate the intercept and underestimate the slope. The degree of bias in both will increase as the number of g observations that take on the value of zero increases. (see Figure next slide)

Source: J.S. Long

The Tobit Model

The Tobit model uses all of the information, information including info on censoring and provides consistent estimates. It is also a nonlinear model and similar to the probit model. It is estimated using maximum likelihood g estimation techniques. The likelihood function for the tobit model takes the form:

This is an unusual function, it consists of two terms, the first for non-censored observations (it is the pdf), and th second f censored observations (it i th cdf). d the d for d b ti is the df)

The Tobit Model

The estimated tobit coefficients are the marginal effects of a change in xj on y*, the unobservable latent variable and can be interpreted in the same way as in a p y linear regression model. But such an interpretation may not be useful since we are interested in the effect of X on the observable y (or change in the censored outcome).
It can b shown th t change i y i found by multiplying be h that h in is f db lti l i the coefficient with Pr(a<y*<b), that is, the probability of being uncensored. Since this probability is a fraction, the marginal effect is actually attenuated. In the above, a and b denote lower and upper censoring points. points For example, in left censoring, the limits will be: example censoring a =0, b=+.

Illustrations for logit, probit and tobit models, using womenwk.dta from Baum available at http://www.stata-press.com/data/imeus/womenwk.dta
Descriptive Statistics N age education married children wagefull wage lw work lwf Valid N (listwise) 2000 2000 2000 2000 2000 1343 1343 2000 2000 1343 Minimum 20 10 0 0 -1.68 5.88 1.77 0 .00 Maximum 59 20 1 5 45.81 45.81 3.82 1 3.82 Mean 36.21 13.08 .67 1.64 21.3118 23.6922 3.1267 .67 2.0996 Std. Deviation 8.287 3.046 .470 1.399 7.01204 6.30537 .28651 .470 1.48752

Binary Logistic Regression

Model Summary Step -2 Log likelihood 1 2055.829a Cox & Snell R Square .212 Nagelkerke R Square .295

a. Estimation terminated at iteration number 5 because parameter estimates changed by less than .001.

Hosmer and Lemeshow Test Step 1 Chi-square 6.491 df 8 Sig. .592

Variables in the Equation B Step 1

S.E. .058 .098 .742 .764 .007 .019 .126 .052 .332

Wald 64.359 27.747 34.401 220.110 156.909

df 1 1 1 1 1

Sig. .000 .000 .000 .000 .000

Exp(B) 1.060 1.103 2.100 2.148 .016

age education married children Constant

-4.159

a. Variable(s) entered on step 1: age, education, married, children.

Binary Probit Regression (in SPSS, use the ordinal regression menu and select probit link function. Ignore the test of parallel lines, etc.)

Model Fitting Information Model -2 Log Likelihood Intercept Only Final Link function: Probit. 1645.024 1166.702 478.322 4 .000 Chi-Square df Sig.

Parameter Estimates 95% Confidence Interval Estimate Threshold Location [work = 0] age education children [married=0] [married=1] Link function: Probit. a. This parameter is set to zero because it is redundant. 2.037 .035 .058 .447 -.431 0a Std. Error .209 .004 .011 .029 .074 . Wald 94.664 67.301 28.061 243.907 33.618 . df 1 1 1 1 1 0 Sig. .000 .000 .000 .000 .000 . Lower Bound 1.626 .026 .037 .391 -.577 . Upper Bound 2.447 .043 .080 .503 -.285 .

Tobit regression cannot be done in SPSS. Use Stata. Here are the Stata commands. First, fit simple OLS Regression of the variable lwf (just to check)

. regress lwf age married children education

Source | SS df MS -------------+-----------------------------Model | 937.873188 4 234.468297 Residual | 3485.34135 1995 1.74703827 -------------+-----------------------------Total | 4423.21454 1999 2.21271363 Number of obs F( 4, 1995) Prob > F R-squared Adj R-squared Root MSE = = = = = = 2000 134.21 0.0000 0.2120 0.2105 1.3218

-----------------------------------------------------------------------------lwf | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------age | .0363624 .003862 9.42 0.000 .0287885 .0439362 married | .3188214 .0690834 4.62 0.000 .1833381 .4543046 children | .3305009 .0213143 15.51 0.000 .2887004 .3723015 education | .0843345 .0102295 8.24 0.000 .0642729 .1043961 _cons | -1.077738 .1703218 -6.33 0.000 -1.411765 -.7437105 ------------------------------------------------------------------------------

. tobit lwf age married children education, ll(0)

Tobit regression

Log likelihood = -3349.9685

Number of obs LR chi2(4) Prob > chi2 Pseudo R2

= = = =

2000 461.85 0.0000 0.0645

-----------------------------------------------------------------------------lwf | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------age | .052157 .0057457 9.08 0.000 .0408888 .0634252 married | .4841801 .1035188 4.68 0.000 .2811639 .6871964 children | .4860021 .0317054 15.33 0.000 .4238229 .5481812 education | .1149492 .0150913 7.62 0.000 .0853529 .1445454 _cons | -2.807696 .2632565 -10.67 0.000 -3.323982 -2.291409 -------------+---------------------------------------------------------------/sigma | 1.872811 .040014 1.794337 1.951285 -----------------------------------------------------------------------------Obs. summary: 657 left-censored observations at lwf<=0 1343 uncensored observations 0 right-censored observations

. mfx compute, predict(pr(0,.))

Marginal effects after tobit y = Pr(lwf>0) (predict, pr(0,.)) = .81920975 -----------------------------------------------------------------------------variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X ---------+-------------------------------------------------------------------age | .0073278 .00083 8.84 0.000 .005703 .008952 36.208 married*| .0706994 .01576 4.48 0.000 .039803 .101596 .6705 children | .0682813 .00479 14.26 0.000 .058899 .077663 1.6445 educat~n | .0161499 .00216 7.48 0.000 .011918 .020382 13.084 -----------------------------------------------------------------------------(*) dy/dx is for discrete change of dummy variable from 0 to 1

. mfx compute, predict(e(0,.))

Marginal effects after tobit y = E(lwf|lwf>0) (predict, e(0,.)) = 2.3102021 -----------------------------------------------------------------------------variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X ---------+-------------------------------------------------------------------age | .0314922 .00347 9.08 0.000 .024695 .03829 36.208 married*| .2861047 .05982 4.78 0.000 .168855 .403354 .6705 children | .2934463 .01908 15.38 0.000 .256041 .330852 1.6445 educat~n | .0694059 .00912 7.61 0.000 .051531 .087281 13.084 -----------------------------------------------------------------------------(*) dy/dx is for discrete change of dummy variable from 0 to 1

Compatibilidad de Ic
75% (4)
Compatibilidad de Ic
5 pages
Moment Generating Functions
No ratings yet
Moment Generating Functions
7 pages
Module-3 Association Analysis: Data Mining Association Analysis: Basic Concepts and Algorithms
No ratings yet
Module-3 Association Analysis: Data Mining Association Analysis: Basic Concepts and Algorithms
34 pages
Oracle Fusion Procurement
100% (4)
Oracle Fusion Procurement
14 pages
7 PDF
No ratings yet
7 PDF
2 pages
2 Right Censoring and Kaplan-Meier Estimator: ST 745, Daowen Zhang
No ratings yet
2 Right Censoring and Kaplan-Meier Estimator: ST 745, Daowen Zhang
33 pages
The Real Number System
No ratings yet
The Real Number System
11 pages
Chapter 2-Life Tables
No ratings yet
Chapter 2-Life Tables
18 pages
3 - Roots of Complex Numbers
No ratings yet
3 - Roots of Complex Numbers
10 pages
Lecture Notes in Statistics: GLIM 82: Proceedings of The International Conference On Generalised Linear Models
No ratings yet
Lecture Notes in Statistics: GLIM 82: Proceedings of The International Conference On Generalised Linear Models
194 pages
Survival - Notes (Lecture 3)
No ratings yet
Survival - Notes (Lecture 3)
23 pages
Notes 3 Applications of Partial Differentiation
No ratings yet
Notes 3 Applications of Partial Differentiation
28 pages
Unit-2 Partial Differentiation
No ratings yet
Unit-2 Partial Differentiation
113 pages
Life Tables Survivorship Curves and Popuation Growth
No ratings yet
Life Tables Survivorship Curves and Popuation Growth
18 pages
Hyperbolic Functions
No ratings yet
Hyperbolic Functions
26 pages
Indeterminate Forms and LHopitals Rule Presentation Slides
No ratings yet
Indeterminate Forms and LHopitals Rule Presentation Slides
45 pages
Survival Competing Risk
No ratings yet
Survival Competing Risk
29 pages
The Probit Model: Alexander Spermann University of Freiburg University of Freiburg Sose 2009
No ratings yet
The Probit Model: Alexander Spermann University of Freiburg University of Freiburg Sose 2009
38 pages
Estimation and Testing of Hypothesis PDF
100% (1)
Estimation and Testing of Hypothesis PDF
75 pages
Estimation and Hypothesis
100% (1)
Estimation and Hypothesis
32 pages
Rolles & Mean Value Theorem - M
No ratings yet
Rolles & Mean Value Theorem - M
26 pages
4 - LM Test and Heteroskedasticity
No ratings yet
4 - LM Test and Heteroskedasticity
13 pages
10E-Poisson Regression
No ratings yet
10E-Poisson Regression
19 pages
Correlation and Regression Feb2014
100% (1)
Correlation and Regression Feb2014
50 pages
Linear Regression Analysis For STARDEX: Trend Calculation
No ratings yet
Linear Regression Analysis For STARDEX: Trend Calculation
6 pages
Ch2 Slides
No ratings yet
Ch2 Slides
80 pages
Limit, Continuity and Differentiability
No ratings yet
Limit, Continuity and Differentiability
64 pages
Mhf4u Student Syllabus
No ratings yet
Mhf4u Student Syllabus
3 pages
Handout 05 Relations and Its Type
100% (1)
Handout 05 Relations and Its Type
12 pages
What Do Limits Have To Do With Calculus?: An Unlimited Review of Limits
No ratings yet
What Do Limits Have To Do With Calculus?: An Unlimited Review of Limits
18 pages
Calc 1
No ratings yet
Calc 1
138 pages
PDF For Successive Approximation
100% (1)
PDF For Successive Approximation
28 pages
Methods of Integration: Substitution Method
No ratings yet
Methods of Integration: Substitution Method
42 pages
Johansen Cointegration Test
No ratings yet
Johansen Cointegration Test
7 pages
Cox Proportional Hazard Model
No ratings yet
Cox Proportional Hazard Model
34 pages
CH3 Vector Calculus
No ratings yet
CH3 Vector Calculus
39 pages
Linear Regression Model
No ratings yet
Linear Regression Model
3 pages
Chap 8
No ratings yet
Chap 8
16 pages
Lecture # 3 (Vectors, Lines and Planes)
100% (1)
Lecture # 3 (Vectors, Lines and Planes)
34 pages
Joint and Conditional Probability Distributions
No ratings yet
Joint and Conditional Probability Distributions
52 pages
Chapter 13 Partial Derivatives
No ratings yet
Chapter 13 Partial Derivatives
174 pages
Statistical Inference
No ratings yet
Statistical Inference
55 pages
Mean Value Theorem
No ratings yet
Mean Value Theorem
33 pages
Motion With Uniform Acceleration
No ratings yet
Motion With Uniform Acceleration
3 pages
Quiz 3 Practice PDF
100% (1)
Quiz 3 Practice PDF
4 pages
Correlation and Regression - The Simple Case
100% (2)
Correlation and Regression - The Simple Case
106 pages
Lecture 9 - Discrete Fourier Transform and Fast Fourier Transform (I)
No ratings yet
Lecture 9 - Discrete Fourier Transform and Fast Fourier Transform (I)
19 pages
Solution CH # 5
No ratings yet
Solution CH # 5
39 pages
Handout 9 PDF
No ratings yet
Handout 9 PDF
79 pages
The SIR Model When S (T) Is A Multi-Exponential Function.
No ratings yet
The SIR Model When S (T) Is A Multi-Exponential Function.
47 pages
151 Practice Final 1
100% (1)
151 Practice Final 1
11 pages
Vector Field Note 2
100% (1)
Vector Field Note 2
33 pages
Stats Discrete Prob Distribution 2011
No ratings yet
Stats Discrete Prob Distribution 2011
22 pages
EC2303 Final Formula Sheet PDF
No ratings yet
EC2303 Final Formula Sheet PDF
8 pages
5 L L EC533: Digital Signal Processing: DFT and FFT
No ratings yet
5 L L EC533: Digital Signal Processing: DFT and FFT
20 pages
4 Duality Theory
No ratings yet
4 Duality Theory
17 pages
Analytic Geometry Reviewer
No ratings yet
Analytic Geometry Reviewer
26 pages
Discrete Math Lecture 04 & HW2
100% (1)
Discrete Math Lecture 04 & HW2
18 pages
Em 18 Equilibrium of A Particle
No ratings yet
Em 18 Equilibrium of A Particle
2 pages
Long-Memory Time Series: Theory and Methods
From Everand
Long-Memory Time Series: Theory and Methods
Wilfredo Palma
No ratings yet
Applications of Derivatives Rate of Change (Calculus) Mathematics Question Bank
From Everand
Applications of Derivatives Rate of Change (Calculus) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet
Logit Probit and Tobit Models For Catego PDF
No ratings yet
Logit Probit and Tobit Models For Catego PDF
19 pages
Econometrics II CH 1
No ratings yet
Econometrics II CH 1
48 pages
Logit and Probit: Models With Discrete Dependent Variables
No ratings yet
Logit and Probit: Models With Discrete Dependent Variables
30 pages
MCAR 10 (1st Edition)
No ratings yet
MCAR 10 (1st Edition)
6 pages
It's Time To Split HR - Analysis
No ratings yet
It's Time To Split HR - Analysis
4 pages
Session 2 - Marketing Environment
No ratings yet
Session 2 - Marketing Environment
40 pages
All Forms Under Factories Act 1948
No ratings yet
All Forms Under Factories Act 1948
2 pages
Aiwa XR-MS5 Verticle CD Executive Micro System Manual
No ratings yet
Aiwa XR-MS5 Verticle CD Executive Micro System Manual
18 pages
Coffee Shop Business Plan Example
No ratings yet
Coffee Shop Business Plan Example
44 pages
PM Reyes Notes On Taxation 2 - Valued Added Tax (Working Draft)
100% (1)
PM Reyes Notes On Taxation 2 - Valued Added Tax (Working Draft)
22 pages
Revised Forestry Code
No ratings yet
Revised Forestry Code
19 pages
SST - Python Projects List (2022)
No ratings yet
SST - Python Projects List (2022)
11 pages
A Path To Greatness: A Book For India BY Dilip Rajeev
No ratings yet
A Path To Greatness: A Book For India BY Dilip Rajeev
512 pages
5L40E/5L50E: Technical Bulletin #873
No ratings yet
5L40E/5L50E: Technical Bulletin #873
2 pages
Final Attachment
No ratings yet
Final Attachment
25 pages
Murat Khairzhan-Uli Munkin Current Position: Mmunkin@usf - Edu
No ratings yet
Murat Khairzhan-Uli Munkin Current Position: Mmunkin@usf - Edu
5 pages
Get Chinese Kites How to Make and Fly Them Direct eBook Download
No ratings yet
Get Chinese Kites How to Make and Fly Them Direct eBook Download
20 pages
Practical Eng
No ratings yet
Practical Eng
69 pages
SKILL 5 - Listen For Who and What
No ratings yet
SKILL 5 - Listen For Who and What
16 pages
Components of The Smart Eye Glasses Price Estimation
No ratings yet
Components of The Smart Eye Glasses Price Estimation
2 pages
Sensors: Evolution of RFID Applications in Construction: A Literature Review
No ratings yet
Sensors: Evolution of RFID Applications in Construction: A Literature Review
21 pages
Tirumala Cotton Recruitment and Selection Recruitment and Selection RCE
No ratings yet
Tirumala Cotton Recruitment and Selection Recruitment and Selection RCE
71 pages
LaserCh 1034 Instructions
No ratings yet
LaserCh 1034 Instructions
7 pages
SA520
No ratings yet
SA520
12 pages
ArtAp 10
No ratings yet
ArtAp 10
5 pages
Request For Quotation of Prices: LFN Trading
No ratings yet
Request For Quotation of Prices: LFN Trading
5 pages
Evolution Packet Answers
No ratings yet
Evolution Packet Answers
5 pages
Reflective Practice Using Assessment Data
No ratings yet
Reflective Practice Using Assessment Data
58 pages
Review Notes in Working Papers Review Notes in Working Papers
No ratings yet
Review Notes in Working Papers Review Notes in Working Papers
16 pages