Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
65 views

Chapter 1 Econometrics

Uploaded by

beletemelaku391
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views

Chapter 1 Econometrics

Uploaded by

beletemelaku391
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

WOLKITE UNIVERSITY

COLLEGE OF BUSINESS AND ECONOMICS

DEPARTMENT OF ECONOMICS

Econometrics II: Module

By: Hulunayen Y. (MSc.)

 Course Description:
This course is a continuation of Econometrics I. It aims at introducing the theory (and
practice) of regression on qualitative data, time series and panel data econometrics as well as
simultaneous equation modeling. It first makes an introduction to the basic concepts in
qualitative data modeling such as dummy variable regression and binary choice models (LPM,
Logit and Probit). Elementary time series models, estimations and tests for stationarity of data
will then be discussed. It also covers introduction to simultaneous equation modeling with
alternative estimation methods. Introductory pooled cross-sectional and panel data models will
finally be highlighted. All of these theoretical concepts will also be complemented by computer
lab practices using statistical packages such as STATA, EViews, PcGive, etc. applied on available
Ethiopian/international data.

CHAPTER ONE

REGRESSION ANALYSIS WITH QUALITATIVE DATA: BINARY (DUMMY


VARIABLES)

1.1 Describing Qualitative Data

 Economic data measures the financial health or wellbeing of a country, specific regions,
categories, or individual markets. Economic data are data describing an actual economy, past or
present. The success of any econometric analysis ultimately depends on the availability of the
appropriate data.
 Types of data: data can be divided in many ways.
 For instance, based on source data can be divided into primary data and secondary data.
 Data can be also divided as cross sectional data, time series data and panel data (please
read).
 Data can be also divided as qualitative data and quantitative data.
 Quantitative data: Data, as a matter of definition, is quantitative. Thus facts, which are already
expressed as numbers. Quantitative data are data that can be quantified. Example: income,
prices, money and the like.

1|Page March 15, 2023


 Qualitative data: qualitative data are sometimes called dummy variables or categorical variable.
These are variables that cannot be quantified. When the variables are qualitative in nature, then
the data is recorded in the form of the indicator function. The values of the variables do not
reflect the magnitude of the data. They reflect only the presence/absence of a characteristic. For
example, variables like religion, sex, taste, etc. are qualitative variables.

1.2 Dummy Regressors

 Four types of variables that one generally encounters in empirical analysis; these are ratio
scale, interval scale, ordinal scale, and nominal scale.
 Regression models that may involve not only ratio scale variables but also nominal scale
variables.
 Such variables are also known as indicator variables, categorical variables, qualitative
variables, or dummy variables.

1.2.1 The Nature Of Dummy Variables:

 In regression analysis the dependent variable is frequently influenced not only by variables
that can be readily quantified on some well-defined scale (e.g., income, output, prices, costs,
height, and temperature), but also by variables that are essentially qualitative in nature (e.g.,
sex, race, color, religion, nationality, wars, earthquakes, strikes, political upheavals, and
changes in government economic policy).
 For example, holding all other factors constant, female college professors are found to earn
less than their male counterparts, and nonwhites are found to earn less than whites.
 This pattern may result from sex or racial discrimination, but whatever the reason, qualitative
variables such as sex and races do influence the dependent variable and clearly should be
included among the explanatory variables.
 Since such qualitative variables usually indicate the presence or absence of a “quality” or an
attribute, such as male or female, black or white, or Christian or Muslim, one method of
“quantifying” such attributes is by constructing artificial variables that take on values of 1 or
0, 0 indicating the absence of an attribute and 1 indicating the presence (or possession) of
that attribute. For example, 1 may indicate that a person is a male, and 0 may designate a
female; or 1 may indicate that a person is a college graduate, and 0 that he is not, and so on.
 Variables that assume such 0 and 1 values are called dummy variables.
 Alternative names are indicator variables, binary variables, categorical variables, and
dichotomous variables.
 Dummy variables can be used in regression models just as easily as quantitative variables. As
a matter of fact, a regression model may contain explanatory variables that are exclusively
dummy, or qualitative, in nature.

2|Page March 15, 2023


 Example: given: Yi = α + βDi + Ui; Where Y= annual salary of a college professor Di = 1 if
male college professor = 0 otherwise (i.e., female professor)
 Note: this is like the two variable regression models encountered previously except that
instead of a quantitative X variable we have a dummy variable D (hereafter, we shall
designate all dummy variables by the letter D).
 This model may enable us to find out whether sex makes any difference in a college
professor’s salary, assuming, of course, that all other variables such as age, degree attained,
and years of experience are held constant. Assuming that the disturbance satisfy the usually
assumptions of the classical linear regression model, we obtain from.
 Mean salary of female college professor: E(Y/Di=0)= α
 Mean salary of male college professor: E(Y/Di=0)= α+ β

 The intercept term α gives the mean salary of female college professors and the slope
coefficient β tells by how much the mean salary of a male college professor differs from the
mean salary of his female counterpart.
   ; reflecting the mean salary of the male college professor. A test of the null hypothesis
that there is no sex discrimination ( H0:   0) can be easily made by running regression in
the usual manner and finding out whether on the basis of the t test the estimated  is
statistically significant.
 How to test whether there is sex discrimination or not?
 Example: 𝑌𝑖 = 18,000 + 3,280Di
(0.32) (0.44)
t = (57.74) (7.439) R2 = 0.8737
 Based on the result the estimated mean salary of female college professor is birr 18,000 and that of
male professor is birr 21,280.

1.2.2. Regression on one quantitative variable and one qualitative variable


with two classes, or categories

 Consider the model: Yi = 𝛼1 + α2Di + βXi + Ui


 where Yi = annual salary of a college professor
 Xi = years of teaching experience
 Di = 1 if male college professor
= 0 otherwise (i.e., female professor)
 The model contains one quantitative variable (years of teaching experience) and one
qualitative variable (sex) that has two classes (or levels, classifications, or categories),
namely, male and female. What is the meaning of this equation? Assuming, as usual,
that E(ui) 0), we see that

3|Page March 15, 2023


 Mean salary of female college professor: E(Y/Xi , Di=0) = 𝛼1+βXi
 Mean salary of male college professor: E(Y/Xi , Di=1) = (𝛼1+α2) + βXi
 The model postulates that the male and female college professors’ salary functions in
relation to the years of teaching experience have the same slope   but different
intercepts.
 The level of the male professor’s mean salary is different from that of the female
professor’s mean salary (by α2) but the rate of change in the mean annual salary by
years of experience is the same for both sexes.

 Figure: Hypothetical scatter gram between annual salary and years of teaching
experience of college professors.

 If the assumption of common slopes is valid, a test of the hypothesis that the two
regressions have the same intercept (i.e., there is no sex discrimination) can be made easily
by running the regression and noting the statistical significance of the estimated 2 on the
basis of the traditional t test.
 If the t test shows that 2 is statistically significant, we reject the null hypothesis that the
male and female college professors’ levels of mean annual salary are the same.
 Before proceeding further, note the following features of the dummy variable regression
model considered previously.

i. To distinguish the two categories, male and female, we have introduced only one dummy
variable Di. For if Di  1 always denotes a male, when Di  0 we know that it is a female
since there are only two possible outcomes. Hence, one dummy variable suffices to
distinguish two categories. The general rule is this: If a qualitative variable has ‘m’ categories,
introduce only ‘m-1’ dummy variables. In our example, sex has two categories, and hence we
introduced only a single dummy variable. If this rule is not followed, we shall fall into what
might be called the dummy variable trap, that is, the situation of perfect multicollinearity.
4|Page March 15, 2023
ii. The assignment of 1 and 0 values to two categories, such as male and female, is arbitrary in
the sense that in our example we could have assigned D=1 for female and D=0 for male.
iii. The group, category, or classification that is assigned the value of 0 is often referred to as the
base, benchmark, control, comparison, reference, or omitted category. It is the base in the
sense that comparisons are made with that category.
iv. The coefficient 2 attached to the dummy variable D can be called the differential intercept
coefficient because it tells by how much the value of the intercept term of the category that
receives the value of 1 differs from the intercept coefficient of the base category.

1.2.3 Regression on one quantitative variable and one qualitative variable with
more than two classes
 Suppose that, on the basis of the cross-sectional data, we want to regress the annual expenditure
on health care by an individual on the income and education of the individual.
 Since the variable education is qualitative in nature, suppose we consider three mutually
exclusive levels of education: less than high school, high school, and college.
 Now, unlike the previous case, we have more than two categories of the qualitative variable
education.
 Therefore, following the rule that the number of dummies be one less than the number of
categories of the variable, we should introduce two dummies to take care of the three levels of
education.
 Assuming that the three educational groups have a common slope but different intercepts in the
regression of annual expenditure on health care on annual income, we can use the following
model:

:
 Note that in the preceding assignment of the dummy variables we are arbitrarily treating the
“less than high school education” category as the base category.
 Therefore, the intercept 1 will reflect the intercept for this category.
 The differential intercepts 2 and 3 tell by how much the intercepts of the other two
categories differ from the intercept of the base category, which can be readily checked as
follows: Assuming E(ui)  0 , we obtain from

5|Page March 15, 2023


: are respectively the mean health care expenditure functions for the three levels of education,
namely, less than high school, high school, and college.
 Geometrically, the situation is shown in figure given below (for illustrative purposes it is
assumed that 3  2).

Figure: Expenditure on health care in relation to income for three levels of education

1.2.4 Regression on one quantitative variable and two qualitative variables


 The technique of dummy variable can be easily extended to handle more than one qualitative
variable. Let us revert to the college professors’ salary regression (5.03), but now assume that
in addition to years of teaching experience and sex the skin color of the teacher is also an
important determinant of salary.
 For simplicity, assume that color has two categories: black and white.
 We can now write this as:

6|Page March 15, 2023


 Notice that each of the two qualitative variables, sex and color, has two categories and hence
needs one dummy variable for each. Note also that the omitted, or base, category now is
“black female professor.”

 Once again, it is assumed that the preceding regressions differ only in the intercept coefficient
but not in the slope coefficient ().
 An OLS estimation of will enable us to test a variety of hypotheses.
 Thus, if 3 is statistically significant, it will mean that colour does affect a professor’s salary.
 Similarly, if 2 is statistically significant, it will mean that sex also affects a professor’s salary.
 If both these differential intercepts are statistically significant, it would mean sex as well as
color is an important determinant of professors’ salaries.
 From the preceding discussion it follows that we can extend our model to include more than
one quantitative variable and more than two qualitative variables.
 The only precaution to be taken is that the number of dummies for each qualitative variable
should be one less than the number of categories of that variable.
 Interaction Effects: Consider the following model:

7|Page March 15, 2023


 Implicit in this model is the assumption that the differential effect of the sex dummy D2 is
constant across the two levels of education and the differential effect of the education dummy
D3 is also constant across the two sexes. That is, if, say, the mean expenditure on clothing is
higher for females than males this is so whether they are college graduates or not. Likewise, if,
say, college graduates on the average spend more on clothing than non-college graduates, this
is so whether they are female or males.
 In many applications such an assumption may be untenable. A female college graduate may
spend more on clothing than a male graduate. In other words, there may be interaction
between the two qualitative variables D2 and D3 and therefore their effect on mean Y may not
be simply additive but multiplicative as well, as in the following model:

From this we obtain

 This is the mean clothing expenditure of graduate females. Notice that


 2  differential effect of being a female
 3  differential effect of being a college graduate
 4  differential effect of being a female graduate
 This shows that the mean clothing expenditure of graduate females is different (by)  4 from
the mean clothing expenditure of females or college graduates.
 If 2, 3, and 4 and are all positive, the average clothing expenditure of females is higher
(than the base category, which here is male non-graduate), but it is much more so if the
females also happen to be graduates.
 Similarly, the average expenditure on clothing by a college graduate tends to be higher than the
base category but much more so if the graduate happens to be a female.
 This shows how the interaction dummy modifies the effect of the two attributes considered
individually. Whether the coefficient of the interaction dummy is statistically significant can be
tested by the usual t test.
 If it turns out to be significant, the simultaneous presence of the two attributes will attenuate
or reinforce the individual effects of these attributes.
 Needless to say, omitting a significant interaction term incorrectly will lead to a specification
bias.

8|Page March 15, 2023


1.3 Limited Dependent Variable Models
 So far, we have implicitly assumed that the dependent variable, or the response variable Y is
quantitative ((for example, y is a dollar amount, a test score, a percentage, or the logs of these),
whereas the explanatory variables are quantitative, qualitative (or dummy), or a mixture of.
 More, Standard linear regression models are applied when the behaviour of economic agents is
approximated by continuous variables such as income, saving, expenditure, output, etc.
 But there are many situations in which the dependent variable in a regression equation simply
represents a discrete choice assuming only a limited number of values.
 Models involving dependent variables of this kind are called limited (discrete) dependent
variable models (also called qualitative response models)
 What happens if we want to use multiple regressions to explain a qualitative event?

9|Page March 15, 2023


 In the simplest case, and one that often arises in practice, the event we would like to
explain is a binary outcome.
 In other words, our dependent variable, Y, takes on only two values: zero and one. For
example, Y can be defined to indicate whether an adult has a high school education; or
Y can indicate whether a college student used illegal drugs during a given school year; or
Y can indicate whether a firm was taken over by another firm during a given year.
 In each of these examples, we can let Y=1 denote one of the outcomes and Y=0 the
other outcome.
 In short, qualitative response models are models in which the dependent variable is a
discrete outcome. Look at the following example:
 Y = α0 + α1 X1 + α2 X2
o Y = 1, if individual i attended college
= 0, otherwise
o In the above example the dependent variable Y takes on only two values (i.e., 0
and 1). Conventional regression cannot be used to analyse a qualitative
dependent variable model.
o The models are analysed in a general framework of probability models.

 Limited Dependent Variable (LDV):


 Limited dependent variable is broadly defined as a dependent variable whose range of
values is substantively restricted.
 Examples: college GPA, the percentage of people participating in a pension plan,
number of arrests, number of Childs born to a woman.
 Most economic variables we would like to explain are limited in some way, often because
they must be positive.
 For example, hourly wage, housing price, and nominal interest rates must be
greater than zero.
 Another important kind of LDV is a count variable, which takes on non-negative integer
values.
 A binary dependent variable is an example of a limited dependent variable (LDV).
 Limited dependent variable models can be used for time series and panel data, but
they are most often applied to cross-sectional data.
 In a model where Y is quantitative, our objective is to estimate its expected, or mean,
value given the values of the regressors.
 Whereas, in models where Y is qualitative (binary dependent variable), our
objective is to find the probability of something happening. Hence, qualitative
response regression models are often known as probability models.

 There are several methods to analyze regression models where the dependent variable is binary.
Now let us turn our attention to the four most commonly used approaches to estimating binary

10 | P a g e March 15, 2023


response models (Type of binomial models). Basically, there are four approaches to develop a
probability model for a binary response variable.
i. Linear probability model
ii. The logt model
iii. The probit model
iv. The tobit (censored regression)model
 The simplest procedure is to just use the usual OLS method. In this case the model is called the
linear probability model (LPM). The other alternative is to say that there is an underlying or
latent variable which we do not observe. What we observe is

1.3.1 The Linear Probability Model (LPM)

 In the 1960’s and early 1970’s the linear probability model was widely used mainly because it is a
model that can be easily estimated using multiple regression analysis.
 The term linear probability model (LPM) is used to denote a regression model in which the
dependent variable y is a dichotomous variable taking the value one (1) if the event occurs or
zero (0) otherwise.
 When we use a linear regression model to estimate probabilities, we call the model the linear
probability model.
 To fix ideas, consider the following regression model:
 Yi = β1 + β2Xi + ui where X = family income and Y = 1 if the family owns a house and
0 if it does not own a house.
 A regression looks like a typical linear regression model but because the regressand is
binary, or dichotomous, it is called a linear probability model (LPM).
 This is because E (Yi | Xi), can be interpreted as the conditional probability that the
event will occur given Xi, that is, Pr (Yi = 1 | Xi).
 Thus, in our example, E (Yi | Xi) gives the probability of a family owning a house and
whose income is the given amount Xi.
 A numerical example of LPM on home ownership Y (1 = owns a house, 0 = does
not own a house) and family income X (thousands of dollars) for the LPM
estimated by OLS was as follows:
o
Y i = −0.9457 + 0.1021Xi
(0.1228) (0.0082)
t = (−7.6984) (12.515)
R2 = 0.8048
The above regression is interpreted as follows:

11 | P a g e March 15, 2023


o The intercept of −0.9457 gives the “probability’’ that a family with zero income
will own a house. But this value is negative, and probability cannot be negative,
so we treat this value as zero.
o The slope value of 0.1021 means that for a unit change in income (here $1,000),
on the average the probability of owning a house increases by 0.1021 or about 10
percent. This is so whether the income level is increased or not. This seems
patently unrealistic. In reality one would expect that Pi is non-linearly related to
Xi.
o Βj measures the change in the probability of success when Xj changes, holding
other factors fixed.

 The linear probability model is the regression model applied to a binary


dependent variable.

 For example, Y can be defined to indicate whether an adult has a high school
education; whether a college student used illegal drugs, cheating, crime during a given
school year; or whether a firm was taken over by another firm during a given year.
 In each of these examples, we can let Y = 1 denote one of the outcomes and Y= 0 the
other outcome.
 What does it mean to write down a multiple regression model, such as:
 𝒀𝒊 = 𝜷𝟎 + 𝜷𝟏 𝑿𝟏𝒊 + 𝜷𝟐 𝑿𝟐𝒊 + 𝜷𝟑 𝑿𝟑𝒊 + ⋯ 𝜷𝒌 𝑿𝒌𝒊 + 𝜺𝒊 …….…….(1)
 Where y is a binary variable
 Because Y can take on only two values, 𝜷𝒌 cannot be interpreted as the change in y given
a one-unit increase in𝑿𝒌 , holding all other factors fixed: y either changes from zero to one
or from one to zero (or does not change).
 Nevertheless, the 𝜷𝒌 still have useful interpretations. If we assume that the zero
conditional mean assumption of MLR holds, that is, 𝑬(𝜺| 𝑿𝟏 , … , 𝑿𝑲 ) = 𝟎, , then we
have, as always,
 𝑬(𝒚|𝒙) = 𝜷𝟎 + 𝜷𝟏 𝑿𝟏𝒊 + 𝜷𝟐 𝑿𝟐𝒊 + 𝜷𝟑 𝑿𝟑𝒊 + ⋯ 𝜷𝒌 𝑿𝒌𝒊 ………….…..(2)

 The key point is that when y is a binary variable taking on the values zero and one, it is
always true that 𝑷(𝒚=𝟏|𝒙) = 𝑬(𝒚|𝒙): the probability of “success”—that is, the probability
that 𝑦 = 1 vis the same as the expected value of y.
 Thus, we have the important equation(LPM) as:
 𝒑(𝒚=𝟏│𝒙)="E(y|x)"=𝜷𝟎+𝜷𝟏𝑿𝟏𝒊+𝜷𝟐𝑿𝟐𝒊+𝜷𝟑𝑿𝟑𝒊+…𝜷𝒌𝑿𝒌𝒊=𝒙𝜷…………(3)
 Equation (3) says that the probability of success, say, p(x) =P(y=1|x), is a linear function of
the 𝑿𝒋, a combination of explanatory variables.

12 | P a g e March 15, 2023


 Equation (3) is an example of a binary response model, and P(y =1|x) is also called the
response probability.
 Because probabilities must sum to one, "p(y = 0|x) = 1 − p(y = 1|x)" is also a linear
function of the 𝑿𝒋.
 The multiple linear regression models with a binary dependent variable is called the linear
probability model (LPM) because the response probability is linear in the parameters 𝜷𝒌.
 In the LPM, 𝜷𝒌 measures the change in the probability of success when 𝑿_𝒋 changes,
holding other factors fixed: ∆P(y=1|X) = 𝜷𝒌∆𝑿𝒌
 Given this, the MLRM can allow us to estimate the effect of various explanatory variables on
qualitative events.
 Example: Lets write the estimated equation(using method of OLS) as:
̂= 𝜷
𝒀 ̂𝟎 + 𝜷̂𝟏 𝑿𝟏 + 𝜷
̂𝟐 𝑿𝟐 + 𝜷
̂𝟑 𝑿𝟑 + ⋯ . 𝜷̂𝒌 𝑿𝒌 …………….(4)
 We must now NOTE that 𝑌̂ is the predicted probability of success.
 Thus, 𝜷̂𝟎 is the predicted probability of success when each 𝑿𝒌 is set to zero, which may
or may not be interesting.
 The slope coefficient 𝜷 ̂𝒌 measures the predicted change in the probability of success
when 𝑿𝒌 increases by one unit.
 Estimation of Labor Force Participation Model(for Married Woman

 Where:
 Inlf = in labor force
 nwifeinc = husband’s earnings, measured in thousands of dollars
 educ = years of education
 exper = past years of labor market experience,
 age = age of mother,
 kidslt6 = number of kids less than six years old, and
 kidsge6 = number of children between 6 and 18 years of age

 Interpretations:
 Only 26.4 percent of the variation in the dependent variable is collectively
explained by change in the explanatory variables considered.
 All explanatory variables except kidsge6 are statistically significant using t-test
or standard Error Test.

13 | P a g e March 15, 2023


 The coefficient on educ means that, everything else held fixed, another year of
education increases the probability of labor force participation by 0.038.
 The coefficient on nwifeinc implies that, if husband’s earning increases by 100
dollar, then the probability that a woman is in the labor force falls by 0.34.
 Having one additional child less than six years old (kidslt6) reduces the
probability of participation by 2.262, at given levels of the other variables.
 The remaining results are interpreted the same way.
Limitations of LPM
 The most important criticism is that the E (y|x) interpreted as the probability that the event
will occur can lie outside the limits (0, 1).
 Because the residuals 𝜺_𝒊 are not normally distributed as they are heteroscedastic, the least
squares method is not, in general, fully efficient.
 In practice, 𝒚 ̂_𝒊 (𝟏−𝒚 ̂_𝒊) or 𝒑_𝒊 (𝟏−𝒑_ ), which is the variance of the error term, may be
negative, although in large samples there is very small probability that this will be so.
 EXERCISE 1
 Show P(y=1|x) = E (y|x)
 Prove the variance of the error term is heteroscedastic, has no constant variance.

1.3.2 The Logit and Probit Models

 The linear probability model (LPM) is simple to estimate and use, but it has some
drawbacks.
 Thus, we have alternative models for binary response variable: Logit and Probit Models.

Specifying Logit and Probit Models


 In the LPM, we assume that the response probability is linear in a set of parameters, 𝜷𝒌.
 To avoid the LPM limitations, consider a class of binary response models of the form:
𝐏 (𝒚=𝟏│𝒙) =𝑭 (𝜷𝟎+𝜷𝟏 𝑿𝟏𝒊+𝜷𝟐 𝑿𝟐𝒊+𝜷𝟑 𝑿𝟑𝒊+…+𝜷𝒌 𝑿ki) =𝑭 ("xβ")…………………. (5)

Where F. is a function taking on values strictly between zero and one: 𝟎< 𝑭 (𝒛) <𝟏, for all
real numbers z. Because, this ensures that the estimated response probabilities are
strictly between zero and one.
 Various nonlinear functions have been suggested for the function F. to make sure that the
probabilities are between zero and one.
 Among others, the most familiar, are logistic function and standard normal cumulative
distribution function (CDF).

 In the logit model, F. is the logistic function:

14 | P a g e March 15, 2023


Or 𝑭(𝒛)=𝟏/(𝟏+𝒆𝒙𝒑(−𝒛))=(𝒆𝒙𝒑(𝒛)" " )/(𝟏+𝒆𝒙𝒑(𝒛))=Λ(𝒛)…………………………………(6)
 This is the CDF for a standard logistic random variable.
 In the Probit Model, F. is the standard normal cumulative distribution function (CDF),
which is expressed as an integral:

(𝒛)= 𝜱(𝒛)=∫ 𝝓(𝒛)𝒅𝒛…………………………………….……….………………….…..(7)


 Where, 𝝓(𝒗) is the standard normal density:

 This choice of F. in both logistic and standard normal functions again ensures is strictly
between zero and one for all values of the parameters and the 𝑿_𝒌𝒊 or z.
 The F. functions in (6) and (7) are both increasing functions. Each increases most quickly at
z = 0, F (z) → 0 as z →-∞, and F (z) → 1 as z → ∞.

𝒆𝒙𝒑(𝒛)
Figure: The plot of logistic function 𝐅(𝐳) = 𝟏+𝒆𝒙𝒑(𝒛)

 The standard normal CDF (for probit model) has a shape very similar to that of the
logistic CDF.

15 | P a g e March 15, 2023


1.3.2.1 The Logit Model
 Let’s consider the model for home ownership:
𝑷 (𝒀=𝟏│𝒙) = E (Y|X) = 𝜷𝟎+𝜷𝟏𝑿𝒊………..……..………… (8)
 Where, 𝒀 = 𝟏 implies the family owns a house, thus, (𝒀=𝟏│𝒙) 𝒊𝒔 𝒕𝒉𝒆 𝒑𝒓𝒐𝒃𝒂𝒃𝒊𝒍𝒊𝒕𝒚 𝒐𝒇
𝒔𝒖𝒄c𝒆𝒔s, i.e. the probability that a family owns a house; 𝑿𝒊 represents the income of a
family.
 Now, we need a model whose CDF will look like the above figure.
 Thus, we need to use the logistic function.

 This is known as the (cumulative) logistic distribution function

16 | P a g e March 15, 2023


 Where:
 𝐏 (𝒀=𝟏│𝒙), is the probability of success (E.g. the probability a married woman participates
in LF) conditional on a set of explanatory variables (Xs).
 𝒛𝟎 is a threshold or cut point to be estimated.
 Z is the standard normal variable i.e. z = (𝑿−𝝁)/𝝈.
 Thus, the probability of success is given by the area between ⤶24−∞ and 𝑧_0 under the
standard normal curve.

How should we estimate nonlinear binary response models?


 To estimate the LPM, we can use OLS or, in some cases, WLS method.
 However, because of the nonlinear nature of E (y|x), OLS and WLS are not applicable in
logit and probit models.

17 | P a g e March 15, 2023


 Thus, we estimate logit and probit models using maximum likelihood method (MLM).
What is Maximum Likelihood Method?
 The foundation for the theory and practice of ML estimation is a probability model:
Pr(Z ≤ z) = F(z; θ)
 Where Z is the random variable distributed according to a cumulative probability
distribution function F (.) with parameter vector θ′ = (θ1, θ2... θk) from Θ, which is the
parameter space for F (.).

 Typically, there is more than one variable of interest, so the model describes the joint
distribution of the random variables, with Z = (z1, z2, ... , zk).
 Using F (.), we can compute probabilities for values of the Zs given values of the parameters
θ.
 In likelihood theory, we turn things around.
 Given observed values z of the variables, the likelihood function is
ℓ(𝜽; 𝒛) = 𝒇(𝒛;𝜽)
 Where f () is the probability density function corresponding to F().
 The point is that we are interested in the element (vector) of Θ that was used to generate z.
We denote this vector by "θ" _𝑻.
 Data typically consist of multiple observations on relevant variables.
 In this case, we write:
ℒ𝒊 (θ; Z) = f (Z; θ)
 Where and f (.) is now the joint-distribution function and Z is matrix of z.
 If we introduce the assumption that “observations” are independent and identically
distributed (i.i.d.) and rewrite the likelihood as:

L (θ; Z) = ℓ (θ; z1) × ℓ (θ; z2) ×···× ℓ (θ; zN) = ∑ (ℓ𝒊 (θ; z𝒊)
 The ML estimates for θ are the values θ ̂ such that
L (θ ̂; Z) = 𝒎𝒂𝒙t∈Θ L (θ; Z)
 Most texts will note that the above is equivalent to finding θ such that
lnL(θ ̂; Z) = 𝒎𝒂𝒙t∈Θ lnL(θ; Z)
 This is true because L () is a positive function and ln () is a monotone increasing
transformation.
Under the i.i.d. assumption, we can rewrite the log likelihood as:
 lnL(θ; Z) = lnℓ(θ; z1) + lnℓ(θ; z2) +···+ lnℓ(θ; zN)

18 | P a g e March 15, 2023


1.3.3 Interpreting the Probit and Logit Model Estimates
 The estimates of both logit and Probit are usually reported using statistical software’s (STATA,
SPSS, EVIEWS, etc.).
 We do not directly interpret the coefficients of the variables rather we interpret their marginal
effects and odd ratios.

19 | P a g e March 15, 2023


 The coefficients give the signs of the partial effects of each Xj on the response probability, and the
statistical significance of Xj is determined by whether we can reject 𝑯𝟎: 𝑩𝒋 = 𝟎 at a sufficiently
small significance level.
 However, the magnitude of the estimated parameters (dZ/dX) has no particular interpretation.
 We care about the magnitude of (𝒅(𝒀))/𝒅𝑿". "
 From the computer output for probit or logit estimation, you can interpret the statistical
significance and sign of each coefficient directly.
 In the linear regression model (LRM), the slope coefficient measures the change in the average
value of the regressand for a unit change in the value of a regressor, with all other variables held
constant.
 In the LPM, the slope coefficient measures directly the change in the probability of an event occurring
as the result of a unit change in the value of a regressor, with the effect of all other variables held
constant.
 For the logit model, the rate of change in the probability of an event happening is given by
𝜷𝒋 𝒑𝒊 (𝟏 − 𝒑𝒊 ), where 𝜷𝒋 is the (partial regression) coefficient of the 𝒋𝒕𝒉 regressor. But in
evaluating𝒑𝒊 , all the variables included in the analysis are involved.
 In the probit model, the rate of change in the probability is somewhat complicated and is given by
𝜷𝒋 𝒇 (𝒁𝒊 ), where f (Zi) is the density function of the standard normal variable and
𝒁𝒊 = 𝜷𝟏 + 𝜷𝟐𝑿𝟐𝒊 + · · · + 𝜷𝒌𝑿𝒌𝒊, that is, the regression model used in the analysis.
 Thus, in both the logit and probit models all the regressors are involved in computing the
changes in probability, whereas in the LPM only the jth regressor is involved.
 This difference may be one reason for the early popularity of the LPM model

 Is logit or probit model is preferable

 In most applications the models are quite similar, the main difference being that the logistic
distribution has slightly fatter tails.
 That is to say, the conditional probability Pi approaches zero or one at a slower rate in logit than
in probit.
 Therefore, there is no convincing reason to choose one over the other.
 In practice many researchers choose the logit model because of its comparative mathematical
simplicity.
 The standard normal CDF has a shape very similar to that of the logistic CDF.
 However, the estimates of β from the two methods are not directly comparable.
 The reason is although both have a mean value of zero, their variances are different; 1 for the
standard normal and π2/3 for the logistic distribution, where π ≈ 22/7.
𝝅
 Therefore, if you multiply the probit coefficient by about 1.81 (which is approximately = ) you
√𝟑
will get approximately the logit coefficient. This equivalent with multiplying the logit estimates
√𝟑
by 𝝅
and then directly comparing with probit estimates.
√𝟑
 Amemiya (1981) suggested that the logit estimates be multiplied by 1/1.6 = 0.625, instead of 𝝅
saying that this transformation produces a closer approximation between the logistic
distribution and the distribution function of the standard normal
 He also suggested that the coefficients of the LPM, 𝛃 ̂𝑳𝑷 and the coefficients of the logit model,
̂
𝛃𝑳𝒐𝒈𝒊𝒕 are related by the relationships:
 ̂𝑳𝑷 = 𝟎. 𝟐𝟓𝛃
𝛃 ̂𝑳𝒐𝒈𝒊𝒕 ………………………except for the constant term

20 | P a g e March 15, 2023


 ̂𝑳𝑷 = 𝟎. 𝟐𝟓𝛃
𝛃 ̂𝑳𝒐𝒈𝒊𝒕 + 𝟎. 𝟐𝟓……………………for the constant term
 Thus, if we need to make 𝛃 ̂𝑳𝑷 comparable to the probit coefficients (𝛃
̂𝒑𝒓𝒐𝒃𝒊𝒕 ), we need to
̂
multiply them (𝛃𝑳𝑷 ) by 2.5 and subtract 1.25 from the constant term.

 An alternative way of comparing the models would be to:


a. Calculate the sum of squared deviations from the predicted probabilities,
b. Compare the percentages correctly predicted, and
c. Look at the derivatives of the probabilities with respect to a particular independent
variable.

21 | P a g e March 15, 2023

You might also like