Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
7 views

binary

The document discusses the use of binary (dummy) variables in regression models to represent qualitative information, such as gender or region. It explains how to properly incorporate dummy variables to avoid issues like perfect collinearity and outlines methods for analyzing binary dependent variables, including linear probability, logit, and probit models. The document emphasizes the importance of careful selection of benchmark categories and the interpretation of coefficients in these models.

Uploaded by

tadelenatnael57
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

binary

The document discusses the use of binary (dummy) variables in regression models to represent qualitative information, such as gender or region. It explains how to properly incorporate dummy variables to avoid issues like perfect collinearity and outlines methods for analyzing binary dependent variables, including linear probability, logit, and probit models. The document emphasizes the importance of careful selection of benchmark categories and the interpretation of coefficients in these models.

Uploaded by

tadelenatnael57
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

Binary (Dummy Variables)

• Describing Qualitative Information


ØIn previous chapters, both the dependent and independent variables
in our regression models are quantitative in their nature.
(e.g., hourly wage rate, years of education, GDP, prices, and costs).
ØHowever, some variables are essentially qualitative or nominal
scale, in nature, such as sex, color, religion, the industry of a firm
(manufacturing, etc.), and the regions in Ethiopia are all considered to
be qualitative factors.
Con…
ØFor example: 1 may indicate that a person is a female and 0 may
designate a male; or 1 may indicate that a person is a college
graduate, and 0 that the person is not, and so on. Variables that
assume such 0 and 1 values are called dummy variables.
oFor instance, holding all other factors constant, female workers are found to
earn less than their male counterparts.
Note that although they are easy to incorporate in the regression
models, one must use the dummy variables carefully. Particularly,

1. When we have a dummy variable for each category or group and


also an intercept in our model, we have a case of perfect
collinearity, that is, exact linear relationships among the
independent variables.
• The sum of all the dummy variables is one. In this case if a qualitative variable has m
categories, introduce only (m − 1) dummy variables. Otherwise we fall into what is
known as the dummy variable trap.
• For each qualitative regressor the number of dummy variables introduced must be
one less than the categories of that variable.
Con…
2. The category for which no dummy variable is assigned is known as
the base, benchmark, control, comparison, reference, or omitted
category and all comparisons are made in relation to the
benchmark category.
• This is the one that is omitted and against which the other dummy variables are
assessed.
3. The intercept value (𝜷1) represents the mean value of the
benchmark category
Con…
4. The coefficients attached to the dummy variables are known as the
differential intercept coefficients
• Because they tell by how much the value of the intercept that receives the value of 1
differs from the intercept coefficient of the benchmark category.
5. If a qualitative variable has more than one category, the choice of
the benchmark category is strictly up to the researcher
Dummy as independent variable
ØConsider the simple model of hourly wage determination:
wage = β1 + δ D + β2edu + ε i
ØIn our model only two observed factors affecting wage rate are gender and
education.
ØSince, D=1 when the person is female, and D=0 when the person is male, the
parameter 𝛿 has the following interpretation: 𝛿 is the difference in hourly wage
between female and males, given the same amount of education.
ØThus, the coefficient 𝛿 determines whether there is discrimination against
women: if 𝛿<0, then, for the same level of other factors, women earn less than
men on average and vice versa.
Con…
ØIn terms of expectations, if we assume the zero conditional mean
assumption 𝐸(𝜀' =0), then
δ = E(wage / D = 1, edu) − E(wage / D = 0, edu)

ØThe key here is that the level of education is the same in both
expectations; the difference, 𝛿, is due to gender only.
ØThat is the regression lines have the same slopes but they differ in their
intercepts.
Suppose 𝛿 <0 (i.e., male earn more wage compared to female of the same
education level). Graphically,
Multiple Dummy Variables Regression Models
ØSuppose we have several dummy explanatory variables. For simplicity let
Y be monthly salaries of public school teachers in Addis Ababa, Sidama
and Oromia (three regions in Ethiopia).
ØThen the multiple linear regression model (assuming all independent
variables are dummy variables) is given by: Y = β1 + β2 D2 + β3 D3 + ε i
⎧1 if the region is Oromia
D2 = ⎨
⎩0 otherwise
1 𝑖𝑓 𝑡ℎ𝑒 𝑟𝑒𝑔𝑖𝑜𝑛 𝑖𝑠 𝑆𝑖𝑑𝑎𝑚𝑎
𝐷) = +
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Now, Addis Ababa is the base group.
Con…
ØAssuming that the error term satisfies the usual OLS assumptions,
from the multiple linear regression model we obtain:
ØMean salary of public school teachers in the Oromia region:
E(Yi / D2 = 1, D3 = 0) = 𝜷1 + 𝜷2
ØMean salary of public school teachers in the Sidama region is:
E(Yi /D2 = 0, D3 = 1) = 𝜷1 + 𝜷3
ØThe mean salary of teachers in the Addis Ababa region is given by:
E(Y /D2 = 0, D3 = 0) = 𝜷1
Example
• Suppose the results based on our multiple regression model are as
follow:

Where * indicates the p values.


Con…
• As these regression results show, the mean salary in Oromia region
is Birr 24,424 (26,158 – 1,734) and the mean salary in Sidama
region is Birr 22,893 (26,158 – 3,265).
• However, the estimated slope coefficient for the Oromia region is
not statistically significant as its p value is about 23 percent.
• Therefore, the overall conclusion is that statistically the mean salaries
of public school teachers in the Addis Ababa region and Oromia
region are about the same but the mean salary of teachers in the
Sidama region is statistically significantly lower by about Birr 3,265.
Dummy as dependent variable
ØIn this case, our dependent variable takes on only two values: zero and
one (i.e., it is the dummy variable). In other words, the regressand
is a binary variable.
ØFor instance, if our dependent variable is decision to participate in labor
force, the response variable is
1 𝑖𝑓 𝑡ℎ𝑒 𝑖𝑛𝑑𝑖𝑣𝑖𝑑𝑢𝑎𝑙 𝑖𝑠 𝑝𝑎𝑟𝑡𝑖𝑐𝑖𝑝𝑎𝑡𝑒 𝑖𝑛 𝑙𝑎𝑏𝑜𝑟 𝑓𝑜𝑟𝑐𝑒
Y=+ 0 𝑖𝑓 𝑡ℎ𝑒 𝑖𝑛𝑑𝑖𝑣𝑖𝑑𝑢𝑎𝑙 𝑖𝑠 𝑛𝑜𝑡 𝑝𝑎𝑟𝑡𝑖𝑐𝑖𝑝𝑎𝑡𝑒 𝑖𝑛 𝑙𝑎𝑏𝑜𝑟 𝑓𝑜𝑟𝑐𝑒
ØSuch binary variable can be analyzed in the general probability models,
which can be binomial or multinomial models.
ØIn this section we consider the case of binomial model.
Con…
ØNow, given our dependent variable is dummy variable, there are
several methods to analyze such regression models.
ØThere are three approaches to developing a probability model for a
binary response variable:
1. The linear probability model (LPM)
2. The logit model
3. The probit model
The linear probability model (LPM)
ØThe linear probability model is the name for multiple regression
models when the dependent variable is binary rather than
continuous.
ØBecause the dependent variable Y is binary, the PRF corresponds to
the probability that the dependent variable equals one, given X.
ØThe OLS predicted probability, 𝑌D , compute using the estimated
regression function, is the predicted probability that the dependent
variable equals 1, and the OLS estimator 𝜷 F1 estimates the change in
the probability that Y = 1 associated with a unit change in X.
Con…
ØConsider the multiple linear regression model of the form:
y = β1 + β2 x2 + β3 x3 + ... + βk xk + ε
ØAssuming that 𝐸(𝜀' =0), then E ( y / x) = β1 + β2 x2 + β3 x3 + ... + βk xk
ØThe key point is that when Y is a binary variable taking on the
values zero and one, it is always true that p( y = 1/ x) = E ( y / x) : the
probability of “success” that is, the probability that Y =1 is the same
as the expected value of Y.
ØIn the LPM, the probability of success, say p( y = 1/ x) = E ( y / x) , is the
linear function of the 𝑥j . This model can be estimated using OLS
estimation method.
Con…
ØNow, if Pi = probability that Yi = 1 (that is, the event occurs), and
(1 − Pi) = probability that Yi = 0 (that is, the event does not occur),
the variable Yi has the following (probability) distribution.
Interpretation of the slope coefficients in the LPM
ØSince Y can take on only two values (0 and 1), β j cannot be
interpreted as the change in Y given a one-unit increase In x j , holding all
other factors fixed: Y either changes from zero to one or from one
to zero.
ØNevertheless, LPM, β j measures the change in the probability of
success when x j changes, holding other factors fixed.
ØThat is, Δp = β j Δx j . This equation gives the marginal effect of x j
on Y.
Advantage of the LPM
ØSimple to estimate and to interpret
ØInference is the same as multiple regression
Shortcoming of the LPM
ØThe assumption of normality for the error term is not acceptable
for the LPMs due to the fact that, like Yi, the disturbances term also
take only two values.
ØThe Variance of the Error Term is not Homoscedastic
ØNon-fulfillment of 0 ≤ E ( y / x ≤ 1 or 0 ≤ p ≤ 1
• Since, the estimated line representing the predicted probabilities might lie
outside the logical band of probability (i.e. predicted probabilities can be
< 0 or >1!)
ØThese limitations call for other nonlinear regression model such as logit
and probit.
Logit model
ØIt is appropriate when the response variable takes one of the only two
possible values representing success and failure.
ØIt is in nonlinear regression models specifically designed for binary
dependent variables.
ØBecause a regression with a binary dependent variable, Y, models
the probability that Y=1, it make sense to adopt a nonlinear
formulation that forces the predicted values to be between 0 and 1.
ØLogit regression model uses logistic cumulative distribution
function.
Con…
ØThe logistic distribution function is given as follows;
I K MN
𝑃' = = IJK MN
IJK LMN
ØWhere 𝑍' =𝛽I + 𝛽Q 𝑋'
𝑷𝒊 = probability of something occurring
e= the base of natural logarithm
ØIf 𝑃' the probability of success, is given in the above, then the
probability of failure is;
1
1 − 𝑃' =
1 + 𝑒 WN
Characteristics of logistics distribution function
ü𝑍' range from −∞ to +∞, 𝑃' range between 0 and 1.
ü𝑃' is nonlinearly related to 𝑍' (i.e. 𝑋' ) and 𝛽'
Odd ratio
ØThe odd ratio in favor of success is given by ratio of;

𝑒 WN
𝑃' 1 + 𝑒 WZ
N
Y1 − 𝑃 = 1 = 𝑒 WN
'
1 + 𝑒 WN
ØTaking the natural logarithm of the above equation, we obtained
[N
𝑙𝑛 YI\[N = 𝑙𝑛 𝑒 WN => 𝛽I + 𝛽Q 𝑋' + 𝑈'
Characteristics of the log odds ratio
ØIt is linear in 𝑋I and 𝛽'
ØAs 𝑃' goes from 0 to 1 (i.e. as Z varies from −∞ to +∞), the logit
L goes from −∞ to +∞. That is, logit is not bounded but
probability is bounded.
ØL is linear in 𝑋' , but the probability of themselves are not.
ØIf L, the log of odd ratio, is positive; it means when the value of
the explanatory variable(s) increases, the odd that the dependent
equals 1 increase and vice versa.
Con…
ØWe are using maximum likelihood which is generally a large
sample method, hence instead of assuming the t-statistics to
evaluate the statistical significance of a coefficient, we use the
standard normal Z-statistic.
ØSlope (βI ) measures the change in L for a unit change in X. that is it
tells how the log odds ratio in favor of success change due to a unit
change of X.
Probit model
ØThe logit model, based on the cumulative logistic distribution
function, helps to estimate regression when the dependent
variable is qualitative.
ØBut, probit model is not only cumulative distribution function
(CDF) that one can use.
ØIn some applications the normal cumulative distribution function
has been found useful. This is called Probit model.
Example
ØSuppose the choice is whether to work or not. The discrete dependent
variable we are working with will assume only two values 0 and 1
1 𝑖𝑓 𝑡ℎ𝑒 𝑖𝑡ℎ 𝑖𝑛𝑑𝑖𝑣𝑖𝑑𝑢𝑎𝑙 𝑖𝑠 𝑤𝑜𝑟𝑘𝑖𝑛𝑔 𝑜𝑟 𝑠𝑒𝑒𝑘𝑖𝑛𝑔 𝑤𝑜𝑟𝑘
Y= +
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
ØWhere i = 1,2,……n. the independent variable that are expected to
affect an individual’s choice may be X1= age, X2= martial status,
X3= gender, X4= education, etc.
ØThe economic interpretation of discrete choice model is typically
based on the principle of utility maximization leading to choice of,
say, A over B if the utility of A exceeds that of B.
Con…
ØTo motivate the probit model, assume that in our employment status
example the decision of the ith individual to participate or not depends
on an unobservable utility index yi* (also known as a latent
variable), that is determined by one or more explanatory variables,
say income.
• In such a way that the large the value of the index yi*, the greater the
probability of an individual to be part of the labor market.
ØLatent variable: unobservable y* which can take all values in (−∞,
+ ∞).
Con…
• y* =utility(labor income) – utility(non labor income)
• Underlying latent model: ∗
1, 𝑦 > 0
yi = + ∗
0, 𝑦 ≤ 0
Where; y* = 𝑥i𝛽 + ℇ𝑖
• probit is based on a latent model

P(𝑦' = 1/𝑥) = 𝑝(𝑦 > 0/𝑥)
=P(𝑥𝛽 + ℇ𝑖 > 0/𝑥)
=P(ℇ𝑖 > − 𝑥𝛽/𝑥)
1 − F(− 𝑥𝛽)
Choosing Between Logit or probit

ØIn the dichotomous case, there is no basis in statistical theory for


preferring one over the other.
ØIn most applications it makes no difference which one uses.
ØIf we have a small sample the two distributions can differ significantly
in their results, but they are quite similar in large sample.
Example
• Suppose the final grade of third year Economics students on Development
Economics one is determined by their entrance exam result (GPA), score on an
examination given at the beginning of the term to test entering knowledge of
Macro Economics one (Grade TS) and personalized system of instruction
(PSI). Furthermore, lettingY=1 if a student’s final grade in an Development
Economics one course was A and Y=0 otherwise (if the final grade was B or
C), and PSI=1 if the new teaching method is used, 0 otherwise.
• Y= 𝜷𝟎 + 𝜷𝟏𝑮𝑷𝑨 + 𝜷𝟐𝑻𝑺 + 𝜷𝟑𝑷𝑺𝑰 + 𝑼𝒊
The Interpretation of the Coefficient

Y Coef. Std. Err. t P>|t| [95% Conf. Interval]

GPA .4638517 .1619563 2.86 0.008 .1320992 .7956043


TS .0104951 .0194829 0.54 0.594 -.0294137 .0504039
PSI .3785548 .1391727 2.72 0.011 .0934724 .6636372

_cons -1.498017 .5238886 -2.86 0.008 -2.571154 -.4248801

v An increase in student’s GPA by 1.0 increases the probability of improvement by 0.46.


v Having exposure to the innovative teaching method (PSI) increases the probability of
improvement by 0.38.
The Interpretation of the 0dds ratio of the logit model
Logistic regression Number of obs = 32
LR chi2(3) = 15.40
Prob > chi2 = 0.0015
Log likelihood = -12.889633 Pseudo R2 = 0.3740

Y Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]

TS 1.099832 .1556859 0.67 0.501 .8333651 1.451502

GPA 16.87972 21.31809 2.24 0.025 1.420194 200.6239

PSI 10.79073 11.48743 2.23 0.025 1.339344 86.93802

vHolding other factors constant, the odds ratio in favor of scoring an A grade in cost one
increases by a factor of 16.879 as the entrance exam result increases by one grade point.
vCeteris paribus, the odds ratio in favor of scoring an A grade in cost one increases by a factor
of 10.79 if the student has exposure to the new innovative teaching method (PSI).
vGPA and PSI are found to be statistically significant factors at 5 percent significance level.
Application of Probit Model When the dependent Variable
takes only two values
Probit regression Number of obs = 32
LR chi2(3) = 15.55
Prob > chi2 = 0.0014
Log likelihood = -12.818803 Pseudo R2 = 0.3775

Y Coef. Std. Err. z P>|z| [95% Conf. Interva]

GPA 1.62581 .6938818 2.34 0.019 .2658269 2.985794

TS .0517289 .0838901 0.62 0.537 -.1126927 .2161506

PSI 1.426332 .595037 2.40 0.017 .2600814 2.592583

_cons -7.45232 2.542467 -2.93 0.003 -12.43546 -2.46917


Interpretation of the Estimated Coefficient
ØIn Probit model estimated coefficients do not quantify the influence
of the independent variables on the probability that the dependent
variable takes on the value of one.
ØEstimated coefficients are parameters of the latent model. Hence, there are
limited ways in which we can interpret the individual regression coefficients.
§ A positive coefficient means that an increase in the predictor leads to an
increase in the predicted probability.
§ A negative coefficient means that an increase in the predictor leads to a
decrease in the predicted probability.
Con…
vGPA -The coefficient of GPA is 1.62581. This means that an increase in GPA score
increases the predicted probability of scoring an A grade in Cost one.
vTS -The coefficient of TS is 0.0517289.This means scoring more marks on
Accounting exam test increases the predicted probability of scoring an A grade in
Cost one.
vPSI -The coefficient of PSI is 1.426332.This means attending a new personalized
system of institutions of increases the predicted probability of scoring an A grade
in Cost one.
v_cons -The constant term is -7.45232.This means that if all of the predictors
(GPA, TS and PSI) are evaluated at zero, the predicted probability of scoring an A
grade in Cost one is F (-7.45232) = 4.586e-14. So, as expected, the predicted
probability of a student with a GPA score of zero and TS of zero from who did not
obtain new personalized system of instruction has an extremely low predicted
probability of scoring an A grade.
The marginal effect of a unit change in the value of a
regressor in the various regression models
ØIn the LPM, the slope coefficient measures directly the change in
the probability of an event occurring as the result of a unit change in the
value of a regressor, with the effect of all other variables held constant.
ØMeanwhile, in the logit model the slope coefficient of a variable gives
the change in the log of the odds associated with a unit change in
that variable, again holding all other variable constant.
ØBut, for the logit model the rate of change in the probability of an
event happening is given by 𝛽𝑗 𝑃𝑖 1 − 𝑃𝑖 , where 𝛽j is the partial
regression coefficient of the jth regressor. But in evaluating 𝑃𝑖, all the
variables included in the analysis are involved.
Con…
ØIn the probit model, the rate of change in the probability is somewhat
complicated and is given by 𝛽𝑗 f(Z𝑖) is the density function of the
standard normal variable and Z𝑖=𝛽I + 𝛽Q 𝑋Q +. . +𝛽t 𝑋t that is,
the regression model used in the analysis.
ØThus, in both the logit and probit models all the regressors are
involved in computing the change in probability, whereas in the LPM
only the jth regressor is involved.
ØThis difference may be one reason for the early popularity of the LPM
model.
Practical Example: Interpretation of the Marginal
Effects In Logit Model
Marginal effects after logit
y = Pr(Y) (predict)
= .25282025

variable dy/dx Std. Err. z P>|z| [ 95% C.I. ] X

GPA .5338589 .23704 2.25 0.024 .069273 .998445 3.11719

TS .0179755 .02624 0.69 0.493 -.033448 .069399 21.9375

PSI* .4564984 .18105 2.52 0.012 .10164 .811357 .4375


(*) dy/dx is for discrete change of dummy variable from 0 to 1

GPA =0.5339; The probability of scoring an A grade in cost one increases


by 53.39% as the GPA of the student increases by 1% , holding all other factors constant.
PSI=0.4565; The probability of scoring an A grade in cost one increases by 45.65% if the
student obtained a new personalized system of instruction, holding all other factors constant.
Practical Example: Interpretation of the Marginal
Effects In Probit Model
Marginal effects after probit
y = Pr(Y) (predict)
= .26580809

variable dy/dx Std. Err. z P>|z| [ 95% C.I. ] X

GPA .5333471 .23246 2.29 0.022 .077726 .988968 3.11719

TS .0169697 .02712 0.63 0.531 -.036184 .070123 21.9375

PSI* .464426 .17028 2.73 0.006 .130682 .79817 .4375


(*) dy/dx is for discrete change of dummy variable from 0 to 1

GPA =0.5333; The probability of scoring an A grade in cost one increases


by 53.33% as the GPA of the student increases by 1% , holding all other factors constant.
PSI=0.4644; The probability of scoring an A grade in cost one increases by 46.44% if the
student obtained a new personalized system of instruction, holding all other factors constant.

You might also like