Chapter 1 Econometrics
Chapter 1 Econometrics
DEPARTMENT OF ECONOMICS
Course Description:
This course is a continuation of Econometrics I. It aims at introducing the theory (and
practice) of regression on qualitative data, time series and panel data econometrics as well as
simultaneous equation modeling. It first makes an introduction to the basic concepts in
qualitative data modeling such as dummy variable regression and binary choice models (LPM,
Logit and Probit). Elementary time series models, estimations and tests for stationarity of data
will then be discussed. It also covers introduction to simultaneous equation modeling with
alternative estimation methods. Introductory pooled cross-sectional and panel data models will
finally be highlighted. All of these theoretical concepts will also be complemented by computer
lab practices using statistical packages such as STATA, EViews, PcGive, etc. applied on available
Ethiopian/international data.
CHAPTER ONE
Economic data measures the financial health or wellbeing of a country, specific regions,
categories, or individual markets. Economic data are data describing an actual economy, past or
present. The success of any econometric analysis ultimately depends on the availability of the
appropriate data.
Types of data: data can be divided in many ways.
For instance, based on source data can be divided into primary data and secondary data.
Data can be also divided as cross sectional data, time series data and panel data (please
read).
Data can be also divided as qualitative data and quantitative data.
Quantitative data: Data, as a matter of definition, is quantitative. Thus facts, which are already
expressed as numbers. Quantitative data are data that can be quantified. Example: income,
prices, money and the like.
Four types of variables that one generally encounters in empirical analysis; these are ratio
scale, interval scale, ordinal scale, and nominal scale.
Regression models that may involve not only ratio scale variables but also nominal scale
variables.
Such variables are also known as indicator variables, categorical variables, qualitative
variables, or dummy variables.
In regression analysis the dependent variable is frequently influenced not only by variables
that can be readily quantified on some well-defined scale (e.g., income, output, prices, costs,
height, and temperature), but also by variables that are essentially qualitative in nature (e.g.,
sex, race, color, religion, nationality, wars, earthquakes, strikes, political upheavals, and
changes in government economic policy).
For example, holding all other factors constant, female college professors are found to earn
less than their male counterparts, and nonwhites are found to earn less than whites.
This pattern may result from sex or racial discrimination, but whatever the reason, qualitative
variables such as sex and races do influence the dependent variable and clearly should be
included among the explanatory variables.
Since such qualitative variables usually indicate the presence or absence of a “quality” or an
attribute, such as male or female, black or white, or Christian or Muslim, one method of
“quantifying” such attributes is by constructing artificial variables that take on values of 1 or
0, 0 indicating the absence of an attribute and 1 indicating the presence (or possession) of
that attribute. For example, 1 may indicate that a person is a male, and 0 may designate a
female; or 1 may indicate that a person is a college graduate, and 0 that he is not, and so on.
Variables that assume such 0 and 1 values are called dummy variables.
Alternative names are indicator variables, binary variables, categorical variables, and
dichotomous variables.
Dummy variables can be used in regression models just as easily as quantitative variables. As
a matter of fact, a regression model may contain explanatory variables that are exclusively
dummy, or qualitative, in nature.
The intercept term α gives the mean salary of female college professors and the slope
coefficient β tells by how much the mean salary of a male college professor differs from the
mean salary of his female counterpart.
; reflecting the mean salary of the male college professor. A test of the null hypothesis
that there is no sex discrimination ( H0: 0) can be easily made by running regression in
the usual manner and finding out whether on the basis of the t test the estimated is
statistically significant.
How to test whether there is sex discrimination or not?
Example: 𝑌𝑖 = 18,000 + 3,280Di
(0.32) (0.44)
t = (57.74) (7.439) R2 = 0.8737
Based on the result the estimated mean salary of female college professor is birr 18,000 and that of
male professor is birr 21,280.
Figure: Hypothetical scatter gram between annual salary and years of teaching
experience of college professors.
If the assumption of common slopes is valid, a test of the hypothesis that the two
regressions have the same intercept (i.e., there is no sex discrimination) can be made easily
by running the regression and noting the statistical significance of the estimated 2 on the
basis of the traditional t test.
If the t test shows that 2 is statistically significant, we reject the null hypothesis that the
male and female college professors’ levels of mean annual salary are the same.
Before proceeding further, note the following features of the dummy variable regression
model considered previously.
i. To distinguish the two categories, male and female, we have introduced only one dummy
variable Di. For if Di 1 always denotes a male, when Di 0 we know that it is a female
since there are only two possible outcomes. Hence, one dummy variable suffices to
distinguish two categories. The general rule is this: If a qualitative variable has ‘m’ categories,
introduce only ‘m-1’ dummy variables. In our example, sex has two categories, and hence we
introduced only a single dummy variable. If this rule is not followed, we shall fall into what
might be called the dummy variable trap, that is, the situation of perfect multicollinearity.
4|Page March 15, 2023
ii. The assignment of 1 and 0 values to two categories, such as male and female, is arbitrary in
the sense that in our example we could have assigned D=1 for female and D=0 for male.
iii. The group, category, or classification that is assigned the value of 0 is often referred to as the
base, benchmark, control, comparison, reference, or omitted category. It is the base in the
sense that comparisons are made with that category.
iv. The coefficient 2 attached to the dummy variable D can be called the differential intercept
coefficient because it tells by how much the value of the intercept term of the category that
receives the value of 1 differs from the intercept coefficient of the base category.
1.2.3 Regression on one quantitative variable and one qualitative variable with
more than two classes
Suppose that, on the basis of the cross-sectional data, we want to regress the annual expenditure
on health care by an individual on the income and education of the individual.
Since the variable education is qualitative in nature, suppose we consider three mutually
exclusive levels of education: less than high school, high school, and college.
Now, unlike the previous case, we have more than two categories of the qualitative variable
education.
Therefore, following the rule that the number of dummies be one less than the number of
categories of the variable, we should introduce two dummies to take care of the three levels of
education.
Assuming that the three educational groups have a common slope but different intercepts in the
regression of annual expenditure on health care on annual income, we can use the following
model:
:
Note that in the preceding assignment of the dummy variables we are arbitrarily treating the
“less than high school education” category as the base category.
Therefore, the intercept 1 will reflect the intercept for this category.
The differential intercepts 2 and 3 tell by how much the intercepts of the other two
categories differ from the intercept of the base category, which can be readily checked as
follows: Assuming E(ui) 0 , we obtain from
Figure: Expenditure on health care in relation to income for three levels of education
Once again, it is assumed that the preceding regressions differ only in the intercept coefficient
but not in the slope coefficient ().
An OLS estimation of will enable us to test a variety of hypotheses.
Thus, if 3 is statistically significant, it will mean that colour does affect a professor’s salary.
Similarly, if 2 is statistically significant, it will mean that sex also affects a professor’s salary.
If both these differential intercepts are statistically significant, it would mean sex as well as
color is an important determinant of professors’ salaries.
From the preceding discussion it follows that we can extend our model to include more than
one quantitative variable and more than two qualitative variables.
The only precaution to be taken is that the number of dummies for each qualitative variable
should be one less than the number of categories of that variable.
Interaction Effects: Consider the following model:
There are several methods to analyze regression models where the dependent variable is binary.
Now let us turn our attention to the four most commonly used approaches to estimating binary
In the 1960’s and early 1970’s the linear probability model was widely used mainly because it is a
model that can be easily estimated using multiple regression analysis.
The term linear probability model (LPM) is used to denote a regression model in which the
dependent variable y is a dichotomous variable taking the value one (1) if the event occurs or
zero (0) otherwise.
When we use a linear regression model to estimate probabilities, we call the model the linear
probability model.
To fix ideas, consider the following regression model:
Yi = β1 + β2Xi + ui where X = family income and Y = 1 if the family owns a house and
0 if it does not own a house.
A regression looks like a typical linear regression model but because the regressand is
binary, or dichotomous, it is called a linear probability model (LPM).
This is because E (Yi | Xi), can be interpreted as the conditional probability that the
event will occur given Xi, that is, Pr (Yi = 1 | Xi).
Thus, in our example, E (Yi | Xi) gives the probability of a family owning a house and
whose income is the given amount Xi.
A numerical example of LPM on home ownership Y (1 = owns a house, 0 = does
not own a house) and family income X (thousands of dollars) for the LPM
estimated by OLS was as follows:
o
Y i = −0.9457 + 0.1021Xi
(0.1228) (0.0082)
t = (−7.6984) (12.515)
R2 = 0.8048
The above regression is interpreted as follows:
For example, Y can be defined to indicate whether an adult has a high school
education; whether a college student used illegal drugs, cheating, crime during a given
school year; or whether a firm was taken over by another firm during a given year.
In each of these examples, we can let Y = 1 denote one of the outcomes and Y= 0 the
other outcome.
What does it mean to write down a multiple regression model, such as:
𝒀𝒊 = 𝜷𝟎 + 𝜷𝟏 𝑿𝟏𝒊 + 𝜷𝟐 𝑿𝟐𝒊 + 𝜷𝟑 𝑿𝟑𝒊 + ⋯ 𝜷𝒌 𝑿𝒌𝒊 + 𝜺𝒊 …….…….(1)
Where y is a binary variable
Because Y can take on only two values, 𝜷𝒌 cannot be interpreted as the change in y given
a one-unit increase in𝑿𝒌 , holding all other factors fixed: y either changes from zero to one
or from one to zero (or does not change).
Nevertheless, the 𝜷𝒌 still have useful interpretations. If we assume that the zero
conditional mean assumption of MLR holds, that is, 𝑬(𝜺| 𝑿𝟏 , … , 𝑿𝑲 ) = 𝟎, , then we
have, as always,
𝑬(𝒚|𝒙) = 𝜷𝟎 + 𝜷𝟏 𝑿𝟏𝒊 + 𝜷𝟐 𝑿𝟐𝒊 + 𝜷𝟑 𝑿𝟑𝒊 + ⋯ 𝜷𝒌 𝑿𝒌𝒊 ………….…..(2)
The key point is that when y is a binary variable taking on the values zero and one, it is
always true that 𝑷(𝒚=𝟏|𝒙) = 𝑬(𝒚|𝒙): the probability of “success”—that is, the probability
that 𝑦 = 1 vis the same as the expected value of y.
Thus, we have the important equation(LPM) as:
𝒑(𝒚=𝟏│𝒙)="E(y|x)"=𝜷𝟎+𝜷𝟏𝑿𝟏𝒊+𝜷𝟐𝑿𝟐𝒊+𝜷𝟑𝑿𝟑𝒊+…𝜷𝒌𝑿𝒌𝒊=𝒙𝜷…………(3)
Equation (3) says that the probability of success, say, p(x) =P(y=1|x), is a linear function of
the 𝑿𝒋, a combination of explanatory variables.
Where:
Inlf = in labor force
nwifeinc = husband’s earnings, measured in thousands of dollars
educ = years of education
exper = past years of labor market experience,
age = age of mother,
kidslt6 = number of kids less than six years old, and
kidsge6 = number of children between 6 and 18 years of age
Interpretations:
Only 26.4 percent of the variation in the dependent variable is collectively
explained by change in the explanatory variables considered.
All explanatory variables except kidsge6 are statistically significant using t-test
or standard Error Test.
The linear probability model (LPM) is simple to estimate and use, but it has some
drawbacks.
Thus, we have alternative models for binary response variable: Logit and Probit Models.
Where F. is a function taking on values strictly between zero and one: 𝟎< 𝑭 (𝒛) <𝟏, for all
real numbers z. Because, this ensures that the estimated response probabilities are
strictly between zero and one.
Various nonlinear functions have been suggested for the function F. to make sure that the
probabilities are between zero and one.
Among others, the most familiar, are logistic function and standard normal cumulative
distribution function (CDF).
This choice of F. in both logistic and standard normal functions again ensures is strictly
between zero and one for all values of the parameters and the 𝑿_𝒌𝒊 or z.
The F. functions in (6) and (7) are both increasing functions. Each increases most quickly at
z = 0, F (z) → 0 as z →-∞, and F (z) → 1 as z → ∞.
𝒆𝒙𝒑(𝒛)
Figure: The plot of logistic function 𝐅(𝐳) = 𝟏+𝒆𝒙𝒑(𝒛)
The standard normal CDF (for probit model) has a shape very similar to that of the
logistic CDF.
Typically, there is more than one variable of interest, so the model describes the joint
distribution of the random variables, with Z = (z1, z2, ... , zk).
Using F (.), we can compute probabilities for values of the Zs given values of the parameters
θ.
In likelihood theory, we turn things around.
Given observed values z of the variables, the likelihood function is
ℓ(𝜽; 𝒛) = 𝒇(𝒛;𝜽)
Where f () is the probability density function corresponding to F().
The point is that we are interested in the element (vector) of Θ that was used to generate z.
We denote this vector by "θ" _𝑻.
Data typically consist of multiple observations on relevant variables.
In this case, we write:
ℒ𝒊 (θ; Z) = f (Z; θ)
Where and f (.) is now the joint-distribution function and Z is matrix of z.
If we introduce the assumption that “observations” are independent and identically
distributed (i.i.d.) and rewrite the likelihood as:
L (θ; Z) = ℓ (θ; z1) × ℓ (θ; z2) ×···× ℓ (θ; zN) = ∑ (ℓ𝒊 (θ; z𝒊)
The ML estimates for θ are the values θ ̂ such that
L (θ ̂; Z) = 𝒎𝒂𝒙t∈Θ L (θ; Z)
Most texts will note that the above is equivalent to finding θ such that
lnL(θ ̂; Z) = 𝒎𝒂𝒙t∈Θ lnL(θ; Z)
This is true because L () is a positive function and ln () is a monotone increasing
transformation.
Under the i.i.d. assumption, we can rewrite the log likelihood as:
lnL(θ; Z) = lnℓ(θ; z1) + lnℓ(θ; z2) +···+ lnℓ(θ; zN)
In most applications the models are quite similar, the main difference being that the logistic
distribution has slightly fatter tails.
That is to say, the conditional probability Pi approaches zero or one at a slower rate in logit than
in probit.
Therefore, there is no convincing reason to choose one over the other.
In practice many researchers choose the logit model because of its comparative mathematical
simplicity.
The standard normal CDF has a shape very similar to that of the logistic CDF.
However, the estimates of β from the two methods are not directly comparable.
The reason is although both have a mean value of zero, their variances are different; 1 for the
standard normal and π2/3 for the logistic distribution, where π ≈ 22/7.
𝝅
Therefore, if you multiply the probit coefficient by about 1.81 (which is approximately = ) you
√𝟑
will get approximately the logit coefficient. This equivalent with multiplying the logit estimates
√𝟑
by 𝝅
and then directly comparing with probit estimates.
√𝟑
Amemiya (1981) suggested that the logit estimates be multiplied by 1/1.6 = 0.625, instead of 𝝅
saying that this transformation produces a closer approximation between the logistic
distribution and the distribution function of the standard normal
He also suggested that the coefficients of the LPM, 𝛃 ̂𝑳𝑷 and the coefficients of the logit model,
̂
𝛃𝑳𝒐𝒈𝒊𝒕 are related by the relationships:
̂𝑳𝑷 = 𝟎. 𝟐𝟓𝛃
𝛃 ̂𝑳𝒐𝒈𝒊𝒕 ………………………except for the constant term