Lecture 7 - Binary
Lecture 7 - Binary
MODELS
Nguyen Quang
quangn@ueh.edu.vn
• Sometimes the dep var under consideration is
BINARY binary:
DEPENDENT • Whether loan application is approved
VARIABLE • Whether borrower can repay loan
• Whether a person has credit card
• ...
2
EXAMPLE: COVID-19 VACCINE PURCHASE
DATA IS MADE AVAILABLE BY EEPSEA
• Problem: the decision to vaccinate for oneself with a hypothetical COVID-19 vaccine
• Data: a survey of 377 individuals in HCMC in 2020
• Data file: EMP4.xlsx
• Dep var:
§ dself : 1 = decide to vaccinate for oneself
• Regressors:
§ efficacy80 : 1 = the efficacy is 80%, 0 = 50%
§ duration3 : 1 = effectiveness duration is 3 years, 0 = 1 year
§ priceUS (USD/2-dose vaccine): price of vaccine.
§ pbenefit : 1 = respondent was provided information on the externality of vaccination
3
EXAMPLE: COVID-19 VACCINE PURCHASE
DATA IS MADE AVAILABLE BY EEPSEA
• Regressors (cont.):
§ hhincomeUS (USD/month): total monthly household income
§ hhsize (members): household size
§ age (years): age of respondent
§ male : gender of respondent, 1 = male, 0 = female
§ edu (categorical): education attainment, 1 = under primary school, 2 = primary,
3 = secondary, 4 = high school, 5 = college, 6 = university or higher.
§ risk (ordinal): perceived risk of COVID-19 infection: 1 = “Very unlikely”,
2 = “Unlikely”, 3 = “Neither”, 4 = “Likely”, 5 = “Very likely”.
4
DATA PREPARATION
5
DATA PREPARATION
6
SUMMARY
STATISTICS
7
SUMMARY STATISTICS
8
9
BIVARIATE
ANALYSIS:
T-TEST FOR
EQUAL MEAN
BIVARIATE
ANALYSIS:
T-TEST FOR
EQUAL MEAN
CHI-SQUARED
TEST
12
OLS WITH BINARY DEP
VAR: THE LINEAR
PROBABILITY MODEL
13
14
DISADVANTAGES OF LPM
15
THE LOGIT MODEL
16
THE LOGIT MODEL
1
Pr 𝑌! = 1 = 𝑃! =
1 + 𝑒 "#$!
17
Optional
THE LOGIT MODEL
• The odd ratio in this case is the ratio between probability of default and
probability of non-default:
𝑃! 1 + 𝑒 #$! #$!
= = 𝑒
1 − 𝑃! 1 + 𝑒 "#$!
• Taking log of both sides, we obtain the logit:
%!
ln = 𝛽𝑋!
&"%!
• LPM assumes Pi linearly correlates with 𝑋! , the Logit model assumes the
logit linearly correlates with 𝑋! .
18
PROPERTIES OF LOGIT MODEL
19
• Maximum Likelihood (ML)
#
where 𝑃! = , 𝑌! is the observed choice.
#%& !"#$
20
21
INTERPRETATION OF THE COEFFICIENTS
22
23
HYPOTHESIS TESTING AFTER LOGIT:
LIKELIHOOD RATIO TEST
The procedure:
• Estimate the full model:
1
𝑃! =
1 + 𝑒 '()$
then obtain the Log-likelihood value 𝐿𝐿* . (Note that log-likelihood = - deviance/2.)
• Suppose we test the null hypothesis: H0: 𝛽# = 𝛽+ = 0 (could be one or more coef.)
• Impose the null hypothesis to the full model, we have the restricted model in which the
variables with 𝛽# and 𝛽+ are removed.
• Estimate the restricted model to obtain the log-likelihood value 𝐿𝐿, .
• The test statistic = 2 𝐿𝐿* − 𝐿𝐿, , follow 𝜒 + distribution with df = number of
coefficients tested.
24
LOGIT REGRESSION – OVERALL SIGNIFICANCE
25
HYPOTHESIS TESTING AFTER LOGIT: WALD CHI-SQUARED TEST
26
• If we want to know when 𝑋 increases by 1 unit, then
how much 𝑃 changes (marginal effect)
MARGINAL 𝜕𝑃! 𝜕 1 𝑒 '()$
EFFECTS = '()
= 𝛽!
𝜕𝑋! 𝜕𝑋! 1 + 𝑒 $ 1 + 𝑒 '()$ +
27
PARTIAL
EFFECTS
AFTER LOGIT
- FOR THE
AVERAGE
OBSERVATION
28
AVERAGE
PARTIAL
EFFECTS
AFTER
LOGIT
29
PREDICTED
PROBABILITY
MARGINAL EFFECTS AT SPECIFIC POINTS
31
ROBUST
STANDARD
ERRORS
LOGISTIC
REGRESSION
WITH ODDS
RATIO
33
THE PROBIT MODEL
34
THE PROBIT MODEL
1
Pr 𝑌! = 1 = 𝑃! =
1 + 𝑒 '()$
• In the PROBIT model, u follows normal distribution
()$
1 '. % /+
Pr 𝑌! = 1 = 𝑃! = 4 𝑒 𝑑𝑧
2𝜋
'-
where F is the cumulative distribution function (CDF) of the normal distribution.
35
ESTIMATING
PROBIT
MODEL IN R
36
OVERALL SIGNIFICANCE AFTER PROBIT
37
TEST FOR JOINT
SIGNIFICANCE
38
PARTIAL
EFFECTS
AFTER
PROBIT – FOR
THE
AVERAGE OBS
AVERAGE
PARTIAL
EFFECTS
AFTER
PROBIT
40
PREDICT
PROBABILITY
AFTER
PROBIT
MARGINAL EFFECTS
AT A SPECIFIC DATA POINT
42
LOGIT OR PROBIT?
43
LOGIT OR
PROBIT?
COMPARING THE
COEFFICIENTS
44
LOGIT OR
PROBIT?
COMPARING
THE
PREDICTED
PROBABILITIES