Econometrics Module 2
Econometrics Module 2
Econometrics Module 2
Contents
1.0 Aims and Objectives
1.1 Definition of Econometrics
1.2 Goals of Econometrics
1.3 Division of Econometrics
1.4 Methodology of Econometrics
1.5 The Nature and Sources of Data for Econometrics Analysis
1.5.1 Types of Data
1.5.2 The Sources of Data
1.6 Summary
1.7 Answers to Check Your Progress
1.8 References
1.9 Model Examination Questions
The purpose of this unit is to let you know what econometrics is all about; and to discuss the
scope, goals, division and methodology of econometric analysis.
Definition:
Definition: Econometrics deals with the measurement of economic relationships.
Econometrics is a combination of economic theory,
theory, mathematical economics and statistics,
statistics, but
it is completely distinct from each one of these three branches of science. The relationships and
differences among these sciences are pointed out below.
A. Economic theory makes statements or hypotheses that are mostly qualitative in nature
Ex.
Ex. Microeconomic theory states that, other things remaining the same, a reduction in the price
of a commodity is expected to increase the quantity demanded of that commodity. But the
theory itself does not provide any numerical measure of the relationship between the two: that
is it does not tell by how much the quantity will go up or down as a result of a certain change in
the price of the commodity. It is the job of econometrician to provide such numerical
statements.
The econometrician often needs special methods since the data are not generated as the result of
a controlled experiment. This creates special problems not normally dealt with in mathematical
statistics. Moreover, such data are likely to contain errors of measurement, and the
econometrician may be called up on to develop special methods of analysis to deal with such
errors of measurement.
To Conclude:
Conclude: Econometrics is an amalgam of economic theory, mathematical economics,
economic statistics, and mathematical statistics. Yet, it is a subject that deserves to be studied in
its own right for the above mentioned reasons.
2. Policy-Making
In many cases we apply the various econometric techniques in order to obtain reliable estimates
of the individual coefficients of the economic relationships from which we may evaluate
elasticities or other parameters of economic theory (multipliers, technical coefficients of
production, marginal costs, marginal revenues, etc.) The knowledge of the numerical value of
these coefficients is very important for the decisions of firms as well as for the formulation of
the economic policy of the government. It helps to compare the effects of alternative policy
decisions.
3. Forecasting
In formulating policy decisions it is essential to be able to forecast the value of the economic
magnitudes. Such forecasts will enable the policy-maker to judge whether it is necessary to
take any measures in order to influence the relevant economic variables.
Econometrics
Theoretical Applied
iii) the mathematical form of the model (number of equations liner or non-linear form of
these equations, etc).
The specification of the econometric model will be based on economic theory and on any
available information relating to the phenomenon being studied. The econometrics must know
the general laws of economic theory, and further more he must gather any other information
relevant to the particular characteristics of the relationship as well as all studies already
published on the subject by other research workers.
The most common errors of specification are:
- the omission of some variables from the functions
- the omission of some equations
- the mistaken mathematical form of the functions.
Evaluation of Estimates
After the estimation of the model the econometrician must proceed with the evaluation of the
results of the calculations that is with the determination of the reliability of these results. The
evaluation consists of deciding whether the estimates of the parameters are theoretically
meaningful and statistically satisfactory. Various criteria may be used.
- Economic a prior criteria:
criteria: – These are determined by the principles of economic
theory and refer to the sign and the size of the parameters of economic relationships. In
econometric jargon we say that economic theory imposes restrictions on the signs and
values of the parameters of economic relationships.
- Statistical criteria:
criteria: – These are determined by statistical theory and aim at the
evaluation of the statistical reliability of the estimates of the parameters of the model.
The most widely used statistical criteria are the correlation coefficient and the
standard deviation( or the standard error) of the estimates. These concepts will be
discussed in the subsequent units. Note that the statistical criteria are secondary only to
the a priori theoretical criteria. The estimates of the parameters should be rejected in
general if they happen to have the wrong sign or size even though the pass the
statistical criteria.
- Econometric criteria:
criteria: – are determined by econometric theory. It aims at the
investigation of whether the assumptions of the econometric method employed are
satisfied or not in any particular case. When the assumptions of an econometric
technique are not satisfied it is customary to re specify the model.
Therefore, the final stage of any applied econometric research is the investigation of the
stability of the estimates, their sensitivity to changes in the size of the sample.
One way of establishing the forecasting power of a model is to use the estimates of the model
for a period not included in the sample. The estimated value (forecast value) is compared with
the actual (realized) magnitude of the relevant dependent variable. Usually there will be a
difference between the actual and the forecast value of the variable, which is tested with the
aim of establishing whether it is (statistically) significant. If after conducting the relevant test of
significance, we find that the difference between the realized value of the dependent variable
and that estimated from the model is statistically significant, we conclude that the forecasting
power of the model, its extra – sample performance, is poor.
Another way of establishing the stability of the estimates and the performance of the model
outside the sample of data from which it has been estimated, is to re-estimate the function with
an expanded sample, that is a sample including additional observations. The original estimates
will normally differ from the new estimates. The difference is tested for statistical significance
with appropriate methods.
This equation is then used for ‘forecasting’ the demand of the commodity in the year 1970, a
period outside the sample data.
Given Y1970 = 1000 and P1970 = 5
Q̂t = 100 + 5(1000) – 30(5) = 4, 950 units.
If the actual demand for this commodity in 1970 is 4, 500 there is a difference of 450 between
the estimated from the model and the actual market demand for the product. The difference can
be tested for significance by various methods. If it is found significant, we try to find out what
are the sources of the error in the forecast, in order to improve the forecasting power of our
model.
The success of any econometric analysis ultimately depends on the availability of the
appropriate data. Let us first discuss the types of data and then we will see the sources and
limitations of the data.
b) Cross-Section data
These data give information on the variables concerning individual agents (consumers or
producers) at a given point of time.
Example:
Example:
- the census of population conducted by CSA.
-survey of consumer expenditure conducted by Addis Ababa university
Note that due to heterogeneity, cross- sectional data have their own problems.
c) Pooled Data
These are repeated surveys of a single (cross-section) sample in different periods of time. They
record the behavior of the same set of individual microeconomic units over time. There are
elements of both time series and cross sectional data.
The panel or longitudinal data also called micro panel data, is a special type of pooled data in
which the same cross-sectional unit is surveyed over time.
The individual (researcher) himself may collect data through interviews or using questionnaire.
In the social sciences the data that one generally obtains is non experimental in nature; that is
not subject to the control of the researcher. For example, data on GNP, unemployment, stock
prices etc are not directly under the control of the investigator. This often creates special
problems for the researcher in pinning down the exact cause or causes affecting a particular
situation.
Limitations
Although there is plenty of data available for economic research, the quality of the data is often
not that good. Reasons are:
- Since most social science data are not experimental in nature, there is the possibility of
observational errors.
- Errors of measurement arising from approximations and round offs.
- In questionnaire type surveys, there is the problem of non-response
- Respondents may not answer all the questions correctly
- Sampling methods used in obtaining data
- Economic data is generally available at a highly aggregate level. For example most macro
data like GNP, unemployment, inflation etc are available for the economy as a whole.
- Because of confidentiality, certain data can be published only in highly aggregate form For
example, data on individual tax, production, employment etc at firm level are usually
available in aggregate form.
Because of all these and many other problems, the researcher should always keep in mind that
the results of research are only as good as the quality of the data. Therefore, the results of the
research may be unsatisfactory due to the poor quality of the available data (may not be due to
wrong model)
1.6. SUMMARY
Definition of Econometrics
Economic theory, mathematical economics and statistics
Methodology of econometrics:
C) Evaluation of Estimates
Criteria for evaluation of the estimates
- Economic a prior criteria:
criteria: – These are determined by the principles of economic
theory and refer to the sign and the size of the parameters of economic relationships.
- Statistical criteria:
criteria: – These are determined by statistical theory
correlation coefficient and the standard deviation( or the standard error) of the
estimates.
- Econometric criteria:
criteria: – are determined by econometric theory.
Types of Data
There are three types of data
A)Time series data
qualitative or quantitative data
dummy variables or categorical variable.
B)Cross-Section data
C)Pooled Data
. The panel data
The Sources of Data
3. The results of research are only as good as the quality of the data. Explain it.
4. Mention some of the reasons for the poor forecasting power of the estimated model.
Content
2.0 Aims and Objectives
2.1 The Concept of Regression Analysis
2.2 Population Regression Function Vs Sample Regression Function
2.3 The Method of Ordinary Least Squares
2.4 Statistical Test of Significance and Goodness of Fit
2.5 Confidence Interval and Prediction
2.6 Summary
2.7 Answers to Check Your Progress
2.8 Model Examination
2.9 References
This unit introduces the key idea behind regression analysis. The objective of such analysis is
to estimate and/or predict the mean or average value of the dependent variable on the basis of
the known or fixed values of the explanatory variables.
be able to apply ordinary least squares method in a two variable regression analysis and
interpret the results.
Regression analysis is concerned with the study of the dependence of one variable, the
dependent variable, on one or more other variables, the explanatory variable with the view to
Basically, the existence of the disturbance term is justified in three main ways.
i) Omission of other variables: although income might be the major determinant of the
level of consumption, it is not the only determinant. Other variables such as interest
rate, or liquid asset holdings may have a systematic influence on consumption. Their
omission constitutes one type of specification error. But, the disturbance term is often
ii) Measurement error: it may be the case that the variable being explained cannot be
measures accurately, either because of data collection difficulties or because it is
inherently un measurable and a proxy variables must be used instead. The disturbance
term can in these circumstances be though of as representing this measurement error
[(of the variable(s)]
iii) Randomness in human behavior. Humans are not machines that will do as instructed.
So there is unpredictable element. Example: due to unexplained case, an increase in
income may not influence consumption. Thus the disturbance term captures such human
behavior that is left unexplained by the economic model.
iv) Imperfect specification of the model. Example: we may have linearized a non-linear
function if so, the random term may tell us the wrong specification
Generally speaking regression analysis is concerned with the study of the dependency of one
dependent variable on one or more other variables called the explanatory variable(s) or the
independent variable(s). Moreover, the true relationship that connects the variables involved is
split in to two. They are systematic (or explained variation and random or (unexplained)
variation. Using (2.2) we can disaggregate the two components as follows
Y = 0 + 1X + U
That is,
[variation in Y] = [systematic variation] + [random variation]
In our analysis we will assume that the “independent” variable X is nonrandom. We will also
assume a linear model. Note that this course is concerned with linear model like (2.3). In this
regard it is essential to know what the term linear really means, for it can be interpreted in two
different ways. These are,
a) Linearity in the variables
b) Linearity in parameters
That is, dy = 2
X Hence the above function is not linear in X since the variable X
dx 2 1
appears with a power of 2
ii) Linearity in the parameter: this implies that the parameters (i.e., ) are raised to their first
degree. In this interpretation Y = 0 + 1X2 is a linear regression model but Y = 0 + 21X is
not. The later is an example of a non linear (in the parameters) regression model of the two
interpretation of linearity, linearity in the parameters is relevant for the development of the
regression theory. Thus the term linear regression means a regression that is linear in the
parameters, the ’s; it may or may not be linear in the explanatory variables.
The following discussion stress that regression analysis is largely concerned with estimating
and/or predicting the (population) mean or average value of the dependent variable on the basis
of the known or fixed values of the explanatory variable(s).
160 PRF
140
120
100
80
60
40
Figure 2.1(the above line) is known as the population regression line or, more generally, the
population regression curve. Geometrically, a population regression curve is simply the locus of
the conditional means or expectations of the dependent variable for the fixed value of the
explanatory variables.
From the preceding discussion it is clear that each conditional mean which is E Y X is a
i
E Y X = f(Xi) = 0 + 1Xi
....................................... (2.3)
i
Note that in real situations we do not have the entire population available for examination. Thus
functional form that f(x) assumes is an important question. This is an empirical question
although in specific cases theory may have something to say.
E Y X = 0 + 1Xi
i
Where 0 and 1 are unknown but fixed parameters known as the regression coefficients
(intercept and slope coefficients respectively). The above equation is known as the linear
population regression function. But since consumption expenditure does not necessarily
increase as income level increases we incorporate the error term. That is,
Yi = E Y X +Ui
i
= 1 + 2Xi + Ui .......................................................(2.4)
Note that in table 2.1we observe that for the same value of X (e.g. 100) we have different value
of Y (65, 70, 75 and 80). Thus the value of Y is also affected by other factors that can be
captured by the error term, U.
If we take the expected value of (2.4) we obtain,
Yi = E Y + E U i .................................(2.5)
E X X
Xi i i
Since
Yi = E Y
E X
Xi i
it implies that
U i = 0
E ...............................(2.6)
Xi
Thus, the assumption that the regression line passes through the conditional means of Y implies
that the conditional mean values of Ui (conditional upon the given X's) are zero.
The regression function based on a sample collected from the population is called sample
regression function (SRF).
Consumption expenditure
Income
Figure 2.2 Sample regression function
Hence, analogous to PRF that underlines the population regression line, we can develop the
concept of the sample regression function (SRF) to represent the sample regression line. The
sample regression function (which is a counterpart of the PRF stated earlier) may be written as:
Yˆi = ̂ 1 + ˆ 2 Xi + Û i ...............................(2.7)
ˆ 2 = estimator of 2 , Û i = an estimator of Ui
To sum up, because our analysis is based on a single sample from some population our primary
objective in regression analysis is to estimate the PRF given by
Yi = 1 + 2Xi + Ui
on the basis of
Yˆi = ̂ 1 + ˆ 2 Xi + Û i
We employ SRF because in most of the cases our analysis is based upon a single sample from
some population. But because of sampling fluctuations our estimate of the PRF based on the
SRF is at best an approximate one.
Xi to the left of the point A, the SRF will underestimate the true PRF. Such over and under
estimation is inevitable because of sampling fluctuations.
Note that there are several methods of constructing the SRF, but in so far regression analysis is
concerned the model that is used most extensively is the method of Ordinary Least Squares,
OLS
In other words, how should the SRF be constructed so that ̂ 1 is as “close” as possible to the
true 1 and ˆ 2 is as “close” as possible to the true 2 even though we never know the true 1
and 2. We can develop procedures that tell us how to construct the SRF to mirror the PRF as
faithfully as possible. This can be done even though we never actually determine the PRF itself.
The method of ordinary least squares has some very attractive statistical properties that have
made it one of the most powerful and popular method of regression analysis.
Thus 1 + 2X represent systematic explain variation and Ui refer to unexplained variation.
However, the PRF is not directly observable. Hence, we estimate it from the SRF. That is,
Yi = ˆ1 ˆ 2 X i Uˆ i
= Yˆi Uˆ i
= Yi – ˆ1 ˆ 2 X i
This shows that Û i , (the residuals) are simply the difference between the actual and the
estimated Y values.
The diagram below reveals this relationship.
Note from the above figure that Û i represents the difference between the Y and Yˆ , (SRF)
Û i are widely spread about the SRF. But this is not possible under the least-square procedure,
for the larger the Û i (in absolute value), the larger the Uˆ i2 .
In other words, the least-square method allows to choose ˆ 0 and ̂ 1 as estimator of 0 and 1
respectively so that
(Yi - ̂ 1 - ˆ 2 Xi)2
is minimum.
If the deviation of the actual from the estimate is the minimum, then our estimation from the
collected samples provides a very good approximation of the true relationship between the
variables
Note that to estimate the coefficients 0 and 1 we need observations on X, Y and U. Yet U is
never observed like the other explanatory variables, and therefore in order to estimate the
function Xi = 0 + 1Xi + Ui, we should guess the value of Ui, That means we should make
some reasonable (plausible) assumptions about the shape of the distribution of each U i (i.e., its
mean, variance and covariance with other U’s). These assumptions are guesses about the true,
but unobservable values of Ui
Thus the linear regression model is based on certain assumptions, some of which refer to the
distribution of the random variable Ui, some to the relationship between U i and the explanatory
variables, and finally some refer to the relationship between the explanatory variables
themselves. The following are assumption underlying the method of least squares.
Assumption 1:
1: Linear regression model: - the regression model is linear in the parameters.
Assumption 2:
2: X (explanatory) values are fixed in repeated sampling. Values taken by the
regressor X are considered fixed in repeated samples. More technically, X is assumed to be non
stochastic. In other wordsThat is our regression analysis is conditional regression analysis, that
is conditional on the given values of the regressor (s) X.
E.g. Recall that for a fixed value of 100 we have Y values of 65, 70, 75 and 80. Hence X
is assumed to be non stochastic.
Assumption 3:
3: Ui is a random real variable. The value which U may assume in any one period
depends on chance. It may be positive, negative or zero. Each value has a certain probability of
being assumed by U in any particular instance.
Assumption 4:
4: Zero mean of the disturbance term. This means that for each value of X, U may
assume various values, some greater than zero and some smaller than zero, but if we consider
all the possible values of U, for any given value of X, they would have an average value equal
to zero. Hence the mean or expected value of the random disturbance term U i is zero.
Symbolically, we have:
E
U i = 0
Xi
That meanse the mean value of Ui
conditional upo the given Xi
zero. Example from table 2.1 we can show that when X = 100 the value of U i are -7.5., -2.5,
2.5 and 7.5 so far its average is zero.
2
U U
Var i = E U i E i
X i X i
= E
U i
Xi
= 2 ..........................................(2.9)
Recall that this holds because of assumption 4
Equation (2.9) states that the variance of Ui.for each Xi is some positive constant number equal
to 2, equal variance. This means that the Y population corresponding to various X values have
the same variance. Consider the following figures.
Y
Y
Fig a Fig b
X
X X1 X2 X3
X1 X2 X3
(a) (b)
Figure 2.5 Variance of the error term for each Xi
Note that in both cases the distribution of the error term is normal. That is the value of U (for
each Xi) have a bell-shaped systematical distribution about their zero mean.
But in fig(a) there is equal variance of the error term (and hence y values) in each period (i.e.,
at all values of X). However, in figure(b) there is unequal spread or variance of the error term.
This situation is known as hetrodcedasticity.
U i = 2
Var
Xi
i
Where the subscript i on 2 indicates that the variance of the Y population is no longer constant.
To understand the rational behind this assumptions, refer figure (b) where var
U
X < Var U X .
1 3
Therefore, the likelihood is that the Y observations coming from the population with X = X1
would be closer to the PRF then those coming from population corresponding to X = X 3. In
short all Y values corresponding to the various X’s will not be equally reliable, reliability being
judged by how closely or distantly the Y values are distributed around their means.
Stated differently this assumption is saying that all Y values corresponding to the various X’s
are equally important since they have the same variance. Thus assumption 5 implies that the
conditional variance of Yi, are also homoscedastic. That is,
Var (Yi /Xi) = 2
Notice from the above two assumptions that the variable Ui has a normal distribution. That is,
U i ~ N (0, 2). This means the random term Ui is with zero mean and constant variance, 2
Assumption 6: No autocorrelation between the disturbances. Given any two X values, X i and
Xj (i j), the correlation between any two U i and Uj (i j) is zero. This implies that the error
term committed for the ith observation is independent of the error term committed for the j th
observation. Such cases are also known as no serial correlation.
Symbolically,
cov [Ui, Uj/Xi, Xj] = E[Ui – E(Ui)/Xi] [Uj – E(Uj)/Xj]
= E(Ui/Xi) (Ui/Xj)
=0
Note that if i = j then we are dealing with assumption five. This is because E(U i/Xi) (Ui/Xj) =
E(U2) = 2
No autocorrelation implies that given X, the deviation of any two Y values form their means
do not exhibit a systematic pattern such as the one shown in the following figure.
Figure (a) and (b) implies that because Ui dependent on Uj it means that Yt = 0 + 1Xt + Ut
depends not only on Xt but also on Ut-1. This is because Ut-1 to some extent determines Ut. Note
that figure (c) shows that there is no systematic pattern to the U’s, thus indicating zero
correlation.
Assumption 7:
7: Zero covariance between Ui and Xi or E(UiXi) = 0. That is, the error term is
independent of the explanatory variable(s). If the two are uncorrelated it means that X and U
have separate influence on Y. But if X and U are correlated it is not possible to assess their
individual effects on Y. But since we have assumed that X values are fixed (or non random) in
repeated samples, there is no way for it to co-vary with the error term. Thus, assumption 7 is
not very crucial.
Assumption 8:
8: the regression model is correctly specified
This assumption implies that there is no specification bias or error in the model used in
empirical analysis. This means that we have included all the important regressions explicitly in
the model and that its mathematical form is correct.
Assumption 9:
9: There is no perfect multicollinearity
That is, there are no perfect linear relationship among the explanatory variables. This
assumption words in case of multiple linear regression. That is, if there is more than one
explanatory variable in the relationship it is assumed that they are not perfectly correlated with
each other. Indeed the repressors should not even be strongly correlated, they should not be
highly multicolinear.
Assumption 10: The number of observations, n must be greater than the number of parameters
to be estimated. Alternatively the number of observations must be greater than the number of
explanatory variables.
At this point one may ask ‘how realistic these assumptions are really’? Note that in any
scientific study we make certain assumptions because they facilitate the development of the
subject matter in a gradual step, not because they are necessarily realistic in the sense that they
replicate reality exactly. What we plan to do is first study the properties of Classical Linear
Regression Model thoroughly and then in unit four we examine what happens if one or more of
the assumptions are not fulfilled.
Note that OLS method demands that the deviation of the actual from the estimated Y-value (i.e.,
Yi - Yˆi ) should be as small as possible. This method provides us with unique estimates of 1
That is, the sum of squared residual deviations is to be minimized with respect to ˆ 0 and ̂ 1 .
Uˆ i2
0
.....................................(2.10)
ˆ 0
and
Uˆ i2
0
..................................... (2.11)
ˆ 1
respect to ˆ 0 will be
Uˆ i2
= 2 Yi ˆ 0 ˆ1 X i 0 .................... (2.12)
ˆ
0
In the same way the partial differentiation of (2.8) with respect to ̂ 1 will be
Uˆ i2
= 2 Xi Yi ˆ0 ˆ1 X i 0 ........................ (2.13)
ˆ 1
XiYi = ˆ 0 X i ˆ1 X i
2
......................................... (2.15)
ˆ1
X iYi ˆ0 X i .......................................... (2.16)
Xi2
Substituting (2.16) in place of ̂ 1 of (2.14) we get,
X i2 Yi X i Yi
ˆ
0 .......................................... (2.17)
2
n X i2 X i
ˆ0
Yi 1 X i
n
= Y - ̂ 1 X .......................................
(2.19)
n X i Yi X i Yi
ˆ1 2 ......................................... (2.20)
n X i2 X i
Therefore, equations (2.17) or (2.19) and (2.20) are the least square estimates since they are
obtained using the least square criteria.
Note that (2.20)is expressed in terms of the original sample observations on X and Y. It can be
shown that the estimate ̂ 1 may be obtained by the following formulae, which is expressed in
deviations of the variables from its mean: That is:
̂ 1 =
x yi i
.............................................
2
x i
(2.21)
where xi = Xi – X and, yi = Yi - Y
In other words
X X Y Y
i
̂ 1 = 2 ............................................
X X
(2.22)
Proof
x y i i
=
X X Y Y
2
x i
2
X X
=
XY XY Y X XY
X 2
2 XX X 2
=
XY X Y Y X nXY
X 2 XX X
2 2
X Y X Y
XY n
Y X n n n n
Hence,
2 X X n X X
X 2
n
n n
2 X Y X Y
XY n
n
=
2
2 X X X
X
X
n
n
n X i Yi X Y i
= 2
X
2
n X i i
In this event we should estimate the function Y = 0 + 1X + U by imposing the restriction, 0
= 0.
This is a restricted minimization problem: we minimize
Uˆ i
2
Y ˆ 0 ˆ1 X 2
Note that estimation of elasticities is possible from an estimated regression line. Recall that in
SRF Yˆi ˆ0 ˆ1 X i is the equation of a line whose intercept is ˆ 0 and its slope ̂ 1 . The
In passing note that the least square estimators (i.e. ˆ 0 and ̂ 1 ) are point estimators, that is,
given the sample, each estimator will provide only a single (point) value of the relevant
population parameter.
In conclusion the regression line obtained using the least square estimators has the following
properties
i It passes through the sample mean of Y and X. Recall that we got ˆ 0 = Y ̂ X which can
Example: Consider the following table, which is constructed using raw data on X, and Y
where the sample size is 10.
xi yi Yi
Yi Xi YiXi Xi2 Xi2
Xi-X Yi-Y XiYi
70 80 5600 6400 -90 -41 8100 3690
65 100 6500 0000 -70 -46 4900 3220
90 120 0800 14400 -50 -21 2500 1050
95 40 13300 19600 -30 -16 900 480
110 160 17600 25600 -10 -1 100 10
115 180 20700 32400 10 4 100 40
120 200 24000 40000 30 9 900 270
140 220 30800 48400 50 29 2500 1450
155 240 37200 57600 70 44 4900 3080
150 260 39000 67600 90 39 8100 3510
Sum 1110 1700 205500 322000 0 0 33000 16800
Mean 111 170 - - 0 0 0 -
Note that column 2 to 7 in the above table is constructed using the information given in column
1 and 2.
We can compute ̂ 0 for the above tabulated figure by applying the formula given in (2.17)
that is,
Similarly we can compute ̂ 1 , by using the formula given in (2.20) or (2.27). That is, using
(2.20), we obtain:
16,800
̂ 1 = = 0.51
330,000
Notice that once we compute ̂ 1 , we can very easily calculate ˆ 0 using (2.19) as follows.
Interpretation of (2.26) reveals that when family income increase by 1 Birr, the estimated
consumption expenditure ̂ 1 amounts to 51 cents.
The value of ̂ 0 = 24.4 (which is the intercept) indicates the average level of consumption
expenditure when family income is zero.
As noted in the previous discussion, given the assumptions of the classical linear regression
model, the least squares possess some ideal or optimum properties. These properties are
contained in the well-known Gauss-Markov theorem.
To understand this theorem, we need to consider the best linear unbiasedness property of an
estimator. That is, as estimator, say OLS estimator ̂ i is said to be best linear unbiased
estimator (BLUE) of i if the following hold:
Y
Y i 1 1
Yi Y1 Y2 ... Yn
n
n
n
ii. Unbiased estimator: an estimator is said to be unbiased if its average or expected value,
Figure 2.7 Unbiased and biased estimators (Using the sampling distribution to illustrate bias)
iii. Minimum Variance estimator (or best estimator) An estimator is best when it has the
smallest variance as compared with any other estimate obtained from econometric
methods. Symbolically ˆ is best if
E ˆ E ( ˆ ) 2
< E E ( ) 2
More formally,
~
Var ( ˆ ) < Var ( )
An unbiased estimator with the least variance is known as an efficient estimator.
The following figure shows the sampling distribution of two alternative estimators, ̂ and ̂
Notice that the property of minimum variance in itself is not important. An estimate may have a
very small variance and a large bias: we have a small variance around the “wrong” mean.
Similarly, the property of unbiasedness by itself is not particularly desirable, unless coupled
with a small variance.
We can prove that the least squares estimators are BLUE provided that the random term U
satisfies some general assumptions, namely that the U has zero mean and constant variance.
This proposition, together with the set of conditions under which it is true, is known as Gauss-
Markov least squares theorem
All Estimators
All Linear Estimators
Linear Unbiased
Estimators
The box above reveals that all estimators are not linear. Furthermore, not all linear estimators
are unbiased. The unbiased linear estimators are a subset of the linear estimators. In the group
of linear unbiased estimator, ˆ has the smallest variance. Hence, OLS possess three properties
namely linear, unbiased and minimum Variance.
estimates, ̂ 0 and ̂ 1 .
=
X (Y Y )
i
xY Y x i
2 2
x i x i
=
xY i i
2
x i
where Ki =
x i
2
x i
Var( ̂ 1 ) = E ˆ1 E ( ˆ1 )
2
Notice that since E( ̂ 1 ) = 1 it follows that
Var( ̂ 1 ) = E ˆ1 E ( ˆ1 )
2
By rearranging (2.29) the above result can be written as
E(KiUi)2
Var( ̂ 1 ) = E(
= E(k12U12 + k22U22 + … + k2nU2n + 2k1k2U1U2 + … 2kn-1knUn-1Un
Recall that Var (U) = u2 = E[Ui - E(Ui)] 2. This is equal to E(Ui)2, because E(Ui) = 0
Furthermore, E(UiUj) = 0 for i j. Thus, it follows that
Var ( ̂ 1 ) = u2Ki2
u2
= .................................................... (2.31)
xi 2
Thus, the standard error (s.e.) of ̂ 1 is given by
u
s.e( ̂ 1 ) = 2 ...................................................
x i
(2.32)
It follows that the variance (and s.e.) of ̂ 0 can be obtained following the same line of
reasoning as above.
Recall from (2.19) that ̂ 0 = Y 1 X .
Moreover, remember that from the PRF we can compute Y 1 ̂ 2 X U . Substituting this on
the above Y we obtain
= 0 + 1 X U ˆ1 X
= - ˆ1 1 X U
Var ( ̂ 0 ) = E( ̂ 0 - 0)
= E ˆ1 1 X U 2
= E X
2
ˆ 1 1 2
U 2
2 X ˆ1 1 U
................................ (2.33)
2
= X 2 E ˆ1 1 E U 2 2 X ˆ1 1 EU
2
U
2 i
Note that E( U ) = n
1
=
n 2
2 2
E U 1 U 2 ......... U n
2
1
= 2
n 2
n
2
=
n
Therefore, using this information we can adjust (2.32) to obtain
2 2
u 2
Var( ̂ 0 ) = X 2
u 0
xi n
2 1 X 2
= u
n x 2
i
Var( ̂ 0 ) = u
2 X i
.................................................. (2.34)
2
n x i
s.e.( ˆ 0 ) = u
X i
................................................. (2.35)
2
n X i
Moreover, the covariance (cov) between ̂ 0 and ̂ 1 describes how ˆ 0 and ̂ 1 are related.
Cov ( ˆ 0 , ̂ 1 ) = E[( ˆ 0 - E( 0 )] [ 1 - E( ̂ 1 )]
Using the information given about E( ̂ 0 - 0) in (2.32), we can rewrite the above result as
follows
Cov ( ˆ 0 , ̂ 1 ) = E(- X ( ̂ 1 - 1 ) + U ) ( ̂ 1 - 1 )]
= E[ U ( ̂ 1 - 1 ) - X ( ̂ 1 - 1 )2]
= 0 -[ X E( ̂ 1 - 1 )2]
Note from (2.31) and (2.34) that the formula of the variance of ̂ 0 and ̂ 1 involve the
variance of the random term U, u2 . However the true variance of Ui cannot be computed since
the values of Ui are not observable. But we may obtain unbiased estimate of u2 from the
expression
ˆ u2
Uˆ i
................................................. (2.37)
n k
where k (which is 2 in this case) stands for the number of parameters and hence n-k represents
the degree of freedom.
Remember that
2 2
U i = Y i Y (Y i 0 1 X i ) 2 .............. (2.38)
n
We squred the simple deviation because y
i
i 0
b) In the same way we define the deviation of the regressed (i.e., the estimate from the line)
values of Yˆ ’s , from the mean value, ŷ Yˆi Y . This is part of the total variation of Yi
which is explained by the regression line. Thus, the sum of the squares of these deviations
is the total explained by the regression line
n n
2 2
[Explained variation] = yˆ i yˆ i y ............................... (2.40)
i 1 i
c) Recall that we have defined the error term Ui as the difference, U Yi Yˆi . This is Part of
the variation of the dependent variable which is not explained by the regression line and is
attributed to the existence of the disturbance variable U. Thus the sum of the squared
residuals gives the total unexplained variation of the dependent variable Y around its
mean. This is given by
n n
[Unexplained variation] = Uˆ i Yi yˆ ...................................... (2.41)
2 2
i 1 i 1
yi2 yˆ i2 Uˆ i2
i 1 i 1 i 1
....................................................... (2.44)
Total Variation Explained Variation Un exp lained Variation ............... (2.45)
Introduction to Econometrics Page 44
Note that because OLS estimators minimizes the sum of squared residuals (i.e., the unexplained
variation) it automatically maximizes r2. Thus maximization of r2 as a criterion of an estimator,
is formally identical to the least squares criterion. Dividing (2.43) by TSS on both sides, we
obtain
ESS RSS
1 =
TSS TSS
From (2.43)point of view the above result can be rewritten as
2
1 =
yˆ Y
Uˆ i
2
........................................... (2.46)
2 2
Y Y Y Y
We now define r2 as
2
2 yˆ Y
r = 2
Y Y
2
=
yˆ i
..................................................... (2.47)
2
y
ESS
Notice that (2.47) is nothing but Note Thus, r2 is the square of correlation coefficient r2
TSS
determines the proportion of the variation of Y which is explained by variations in X. For this
reason r2 is also called the coefficient of determination. It measures the proportion of the total
variation in Y explained by the regression model.
RSS
r2 = 1 –
TSS
Uˆ i
2
= 1 2 ................................................. (2.48)
Y Y
The relationship between r2 and the slope ̂ 1 indicates that r2 may be computed in various
ways given by the following formulas
r2 = ˆ1
xy ................................................... (2.49)
2
y
= ˆ1
2 x ................................................... (2.50)
2
y
Note that if we are working on cross section data, an r 2 value equal to 0.5 may be a good fit.
But for time series data 0.5 may be too low. This means that there is no hard-and-fast rule as to
how much r2 should be. Generally, however, r2 is a good fit the higher the value of it is.
Replacing u2 by ˆ u2
Uˆ i
where n = 10 and k = 2 we get ˆ u2 = 42.16
n k
42.16(322,000)
= 41.13
10(33,000)
42.16
= 33,000 0.0013
We can calculate r2 by using (2.47), (2.48), (2.49) or (2.50). for this example we use
(2.49) and (2.50)
(0.51)2 (33,000)
Using (2.50), r2 = = 0.96
8890
In addition to r2 testing of the reliability of the estimates ( ˆ 0 , ̂ 1 ) should be done. That is, we
must see whether the estimates are statistically reliable. That is, since ̂ 0 and ̂ 1 are sample
estimates of the parameters 0 and 1, the significance of the parameter estimates should be
seen. Note that given the assumption of normally distributed error term, the distribution of
Among a number of tests in this regard, we will examine the standard error test. This test helps
us to decide whether the estimates ̂ 0 and ̂ 1 are significantly different from zero, i.e.,
whether the sample from which they have been estimated might have come from a population
whose true parameters are zero (
(0 = 0 and/or 1 = 0). Formally we test the null hypothesis.
In statistics, when we reject the null hypothesis, we say that our finding is statistically
significant. On the other hand, when we do not reject the null hypothesis, we say that our
finding is not statistically significant.
Some times we have a strong a priori or theoretical expectation (or expectations based on some
previous empirical work) that the alternative hypothesis is one sided or unidirectional rather
than two-sided, as just discussed.
For instance in a consumption – income function C = 0 + 1Y one could postulate that:
H0: 1 0.3
H1: 1 > 0.3
That is, perhaps economic theory or prior empirical work suggests that the marginal propensity
to consume (
(1 ) is greater than 0.3. [Note: Students are strongly advised to refer and grasp the
discussion in unit 7 and 8 of the course Statistics for Economics]
Recall that in order to test a hypothesis of the kind discussed above we need to make use of Z
and t- tests
b) The Z-test of the least squares estimates
Recall what we said in statistics for economics course that the Z-test is applicable only if
a) the population variance is known, or
b) the population variance is unknown, and provided that the sample size is sufficiently
large (n > 30).
In econometric applications the population variance of Y is unknown. However, if we have a
large sample (n > 30) we may still use the standard normal distribution and perform the Z test.
If these conditions cannot be fulfilled, we apply the student’s t-test.
Recall that in our statistics to economics course we learned the formula which transforms the
value of any variable X into t units as shown below.
Xi
t=
Sx
n = Sample size
Accordingly the variable
ˆi i ˆ i
t= = i ............................................... (2.51)
Var i s.e. i
i = hypothesized value of i
Var ̂ i = estimated variance of i (from the regression)
n = Sample size
k = total number of estimated parameters
the null hypothesis, that is, we accept that the estimate ̂ i is statistically significant
Acceptance
Rejection Region Rejection
region region
-t/2 t/2
then accept H0 which implies that ̂ i has insignificant or marginal contribution to the model.
Recall that if it is a one tailed test, the rejection region is found only on one side. Hence, we
reject H0 if t* > t. or t* < -t.
Note that the t-test can be performed in an approximate way by simple inspection. For (n – k) >
8, if the observed t* is greater than 2 (or smaller than –2), we reject the null hypothesis at 5
percent level of significance. If on the other hand, the observed t* is smaller than 2 (but greater
than –2) we accept the null hypothesis at 5% level of significant.
Given (2.50) the sample value of t* would be greater than 2 if the relevant estimate ( ˆ 0 or ̂ 1
) is at least twice its standard deviation. In other words, we reject the null hypothesis if
t* > 2 if ̂ i > 2 s.e. ( ̂ i ) ........................................................... (2.53)
or S( ̂ i ) < ̂ i /2
Example: Suppose that from a sample size n = 20, we estimate the following consumption
function.
Ĉ = 100 + 0.70Y
(75.5) (0.21)
ˆ 0 = 100
s.e ( ˆ 0
*
t = = 75.5 = 1.32
=)
̂ 1 = 0.70
t* = s.e ( ̂ 1 = 0.21 = 3.3
=)
Note that for β0 since the calculated value (=1.32) is less than the table value (2.10), we cannot
reject the H0: β0=0. Thus the estimated value β0 is insignificant.
But for ̂ 1 , the calculated value (=3.3) is greater than the table value (2.10), we reject H0: β1=0
indicating that indeed the estimated value of β1 is significant in affecting the relationship
between the two variable.
In conclusion, note that if a researcher gets high r2 value and the estimates have low standard
errors then the result is good. In practice, however, such an ideal situation is rare. Rather we
may have low R2 values and low standard errors or high r 2 values but high standard errors.
There is no agreement among econometricians in this case, so the main issue is whether to
obtain high r2 or lower standard error of the parameter estimates.
In general r2 is more important if the model is to be used for forecasting. Standard error
becomes more important when the purpose of the exercise is the explanation or analysis of
economic phenomena and the estimation of reliable values of the economic relationship.
Recall what we have said about constructing confidence interval in the course “Statistics for
Economics”. We said that in confidence interval analysis first we determine the probability
level. This is referred to as the confidence level (or confidence coefficient). Usually the 95%
confidence level is chosen. This means that in repeated sampling the confidence limits,
computed from the sample, would include the true population parameter in 95 percent of the
cases. In the other 5 percent of the cases the population parameter will fall outside the
confidence limit.
The confidence interval can be constructed by the standard normal distribution or the t-
distribution
i) Confidence Interval from the Standard Normal Distribution (Z-distribution)
Recall that the Z-distribution may be employed either if we know the true standard deviation
(of the population) or when we have a large sample (n > 30). This is because, for large samples,
the sample standard deviation is a reasonably good estimate of the unknown population
standard deviation.
ˆ i i
Z= .................................................. (2.54)
s.e( ˆi )
where s.e = standard error
Our first task is to choose a confidence coefficient say 95 percent. We next look at the standard
normal table and find that the probability of the value of Z lying between –1.96 and 1.96 is
0.95. This may be written as follows
or i = ̂ i 1.96 (S.e ̂ i )
Example:
Example: given ̂ i = 9 and s.e( ̂ i ) = 2, choosing a value of 95 percent for the confidence
coefficient
Solution: we find the confidence interval to be
i = 9 1.96(2)
= 5.08 < i 12.92
Thus, from our single sample estimate we are 95% confident that the (unknown) true
population parameter will lie between 5.08 and 12.92.
If ̂ i = 8.4 and s.e( ̂ i ) = 2.2, choose the 95 percent for the confidence coefficient
ˆi i
Recall that: t= with (n-k) degrees of freedom
s..e ˆi
ˆi i
Substituting t = in the above expression, we find
s.e ˆi
ˆi i
P(-t0.025 < < t0.025) = 0.95
s.e ˆi
Rearranging this we obtain
P[ ̂ i - t0.025 (s.e ̂ i ) < i < ̂ i + t0.025 (s.e ̂ i )] = 0.95
Thus the 95 percent confidence interval for , when we use a small sample for its estimation, is
i = ̂ i t0.025 (s.e ̂ i ) with (n-k) degrees of freedom
Example:
Example: Given the following regression from a sample of 20 observation
Yˆi = 128.5 + 2.85Xi
(38.2) (0.85)
where the results in the parenthesis are standard errors. Construct the 95% confidence interval
for the intercept and slope.
where Yˆi is the estimator of true E(Y i) corresponding to given X. Note that there are two kinds
of predictions in this regard
i) Prediction of the conditional mean value of Y (mean prediction) and
ii) Prediction of an individual Y value corresponding to X0 (individual prediction)
a) Mean prediction
Suppose that we are interested in the prediction of the conditional mean of Y in (2.54)
corresponding to a chosen X, say X0. To fix the idea, assume that X0 = 100 and we want to
predict E(Y/X0 = 100). It can be shown that regression (2.54) provides the point estimate of this
mean prediction as follows:
Yˆ0 = ̂ 0 + ˆ1 X 0
Note that since Yˆ0 is an estimator, it is likely to be different from its true value. The difference
between the two values will give some idea about the prediction or forecast error. In order to
see this we need to know the mean and variance of Yˆ0 which is given by:
1 X 0 X 2
2
=
u ............................................. (2.56)
n xi2
Recall that ˆ 2 =
RSS
Uˆ i 2
n k n k
What we can infer from the above results is that that the variance (and the standard error)
increases the further away the value of X0 is from X . Therefore the variable
Yˆ0 Y
t =
s.e(Yˆ ) 0
ˆ 0
ˆ1 X 0 0 1 X 0
............................................ (2.58)
s.e(Yˆ )
0
Now, suppose Var( Yˆ0 ) = 10.4759 where n = 10. We can construct the 95% confidence interval
for true E(Y/X0) = 0 + 1X0.
Note that the table value t0.025 for 8 degrees of freedom is 2.306. Moreover, recall that we
obtained Yˆ = 75.36. Thus, the 95% confidence interval is given by
75.36 – 2.306 10.47 E(Y /X = 100) 75.36 + 2.306
0 10.47
b) Individual Prediction
If our interest lies in predicting an individual Y value, Y 0, corresponding to a given X value, say
X0, then the application in forecasting is called individual prediction
Consider the following example
Yˆ0 = 24.45 + 0.50X0
As discussed earlier we can give the point estimate for Yˆ0 to a given value of X0, say 100.
= ( ̂ 0 + ˆ1 X 0 ) – (
(0 + 1X0 + U0)
ˆ 2
1
1
X0 X
vâr Yˆ1 2
n X i X
( -Y0) = ............................................... (2.61)
Note that the variance increases the further away the value of X0 is from X
Note that the standard error is given by s.e = Var (Y0 Yˆ0 )
(2.62)
Follows a t-distribution with n-2 degrees of freedom. Therefore, the t-distribution can be used
to draw inferences about the true Y0. Continuing with the above example we see that the point
prediction of Y0 is 75.36. Suppose that its variance is 52.63. Thus, the 95% confidence interval
for Y0 corresponding to X0 = 100 is seen to be
75.36 – 2.306 52.63 Y
0 X 0 = 100 75.36 + 2.306 52.63
= (58.63 Y0 X 0 = 100 92.09)
2.6 SUMMARY
The overall goodness of fit of the regression model is measured by the coefficient of
determination, r2. It tells what proportion of the variation in the dependent variable, or
regress and is explained by the explanatory variable or the regressor. This r2 lies between 0
and 1; the closer it is to 1, the better is the fit.
Hypothesis testing answers the question of whether a given finding is compatible with a
stated hypothesis or not. In hypothesis testing the Z-test and t test are used, among others.
If the model is deemed practically adequate, it may be used for forecasting (predicting)
purpose. In this regard we have mean predication and interval prediction.
5. One way of solving this problem is to use the lagrangean function. Recall the concept of
constrained optimization process from Calculus for Economics Course. The lagrangean
function associated with the problem is developed as follows
L Y ˆ 0 ˆ1 X 2
0
where is the lagrangean multiplier. We minimize the function with respect to ̂ 0 ̂ 1 and
L
0
= 2 Y ˆ 0 ˆ1 X 0 ------------------------- a
L
1
= 2 X Y ˆ 0 ˆ1 X 0 ------------------------ b
L
ˆ 0 0 ------------------------- c
Substituting (c) into (b) and re-arranging we obtain
2 X Y ˆ1 X 0
ˆ1
XY
X2
a) Ŷi = 2.69 - 0.48 Xi
c) r2 = 0.66
For no. 1
First we need to have the point estimate of Yˆ0 given X0 = 60 as follows
Yˆ0 = 6.7 + 0.25(60) = 21.7
= 12.95, 30.45
Thus, given X = 60 in repeated sampling, 95 out of 100 cases will include the true mean value
of Y in the interval given above.
For no.2
This is individual prediction problem
i) For X0 = 850
Yˆ0 = 31.76 + 0.71(850)
= 635.2
ii) Note that s.e ( Yˆ0 ) = 5.68
The value of t0.025 for 10 degrees of freedom is 2.23. Hence, the 95% confidence interval is
given by
Y0 = 635.2 2.23 (5.68)
= 622.3 < Y0 < 647.67
So we are 95% confident that the forecasted value of Y(= Y 0) will lie between 622.3 and
647.67.
The following table includes GDP(X) and the demand for food (Y) for a certain country over
ten year period.
Year 1980 81 82 83 84 85 86 87 88 89
Y 6 7 8 10 8 9 10 9 11 10
X 50 52 55 59 57 58 62 65 68 70
2. Calculate elasticity of Y with respect to X at their mean value and interpret your result.
3. Compute r2 and find the explained and unexplained variation in the food expenditure.
4. Compute the standard error of the regression estimates and conduct tests of significance at
the 5% significant level.
5. Find the 95% confidence interval for the population parameter (β0 and β1)
Ŷi = 31.76 + 0.71 Xi
r2 = 0.99 , ˆ u2 . = 285.61
7. Construct a 95% confidence interval for the result you obtained in (6). [Hint: use individual
prediction approach]
Contents
3.0 Aims and Objectives
3.1 Introduction
3.2 Specification of the Model
3.3 Assumptions
3.4 Estimation
3.5 The Coefficient of Multiple Determination
3.6 Test of Significance in Multiple Regression
3.7 Forecasting Based on Multiple Regression
3.8 The Method of Maximum Livelihood (ML)
3.9 Summary
3.10 Answers to Check Your Progress
3.11 References
3.12 Model Examination Question
The purpose of this unit is to introduce you with the concept of multiple linear regression
model and show how the method of OLS can be extended to estimate the parameters of such
models.
3.1 INTRODUCTION
We have studied the two-variable model extensively in the previous unit. But in economics you
hardly found that one variable is affected by only one explanatory variable. For example, the
demand for a commodity is dependent on price of the same commodity, price of other
competing or complementary goods, income of the consumer, number of consumers in the
market etc. Hence the two variable model is often inadequate in practical works. Therefore, we
need to discuss multiple regression models. The multiple linear regression is entirely concerned
with the relationship between a dependent variable (Y) and two or more explanatory variables
(X1, X2, …, Xn).
Let us start our discussion with the simplest multiple regression model i.e., model with two
explanatory variables.
Y = f(X1, X2)
Example:
Example: Demand for a commodity may be influenced not only by the price of the commodity
but by the consumers income.
Since the theory does not specify the mathematical form of the demand function, we assume
the relationship between Y, X1, and X2 is linear. Hence we may write the three variable
Population Regression Function (PRF) as follows:
Yi = o + 1X1i + 2X2i +Ui
3.3 ASSUMPTIONS
To complete the specification of our simple model we need some assumptions about the
random variable U. These assumptions are the same as those assumptions already explained in
the two-variables model in unit 2.
3.4 ESTIMATION
We have specified our model in the previous subsection. We have also stated the assumptions
required in subsection 3.3. Now let us have sample observations on Y, X 1i, and X2i and obtain
estimates of the true parameters b0, b1 and b2
Yi X1i X2i
Y1 X11 X21
Y2 X12 X22
Y3 X13 X23
Yn X1n X2n
The sample regression function (SRF) can be written as
Yi = o 1 X 1i 2 X 2i U i
Where o , 1 and 2 are estimates of the true parameters o , 1 and 2
As discussed in unit 2, the estimates will be obtained by choosing the values of the unknown
parameters that will minimize the sum of squares of the residuals. (OLS requires the Ui2 be as
small as possible). Symbolically,
n
Min Ui2 = (Yi – Ŷi )2 =
i
(Yi - o 1 X 1i 2 X 2i )2
A necessary condition for a minimum value is that the partial derivatives of the above
expression with respect to the unknowns (i.e. o , 1 and 2 ) should be set to zero.
2
Yi 0 1 X 1i 2 X 2i
0
0
2
Yi 0 1 X 1i 2 X 2i
0
1
2
Yi 0 1 X 1i 2 X 2i
0
2
x y x x y x
1i i
2
2i 2i i 1i x 2i
1 = 2
x x x x 2
1i
2
2i 1i 2i
x 2i y i x12i
x y x 1i i 1i x 2i
2= 2
x x x x
2
1i
2
2i 1i 2i
The estimates are unbiased estimates of the true parameters of the relationship between Y, X 1
and X2. The means expected value of the estimates is the true parameter itself.
The variance of o , 1 and 2 .
2
2
Var( 2 ) = ˆ u
x 1
2
x x 2
1
2
2 x x 1 2
Where ˆ 2
U , k being the total number of parameters that are estimated. In the above
u
n k
case (three-variable model, k = 3)
x1 and x2 are in deviations form.
In unit 2 we saw the coefficient of determination (r 2) that measures the goodness of fit of the
regression equation. This notion of r2 can be easily extended to regression models containing
more than two variables.
The quantity that gives this information is known as the multiple coefficient of determination. It
is denoted by R2, with subscripts the variables whose relationships is being studies.
Example:
Example: R 2 y . X 1 X 2 - shows the percentage of the total variation of Y explained by the
regression plane, that is, by changes in X1 and X2.
2
Y Y
2
R 2
=
yˆ i
i
y. X1 X 2 2 2
y i Y i Y
2
=1–
U i
1
RSS
2
y i
TSS
where: RSS – residual sum of squares
TSS – total sum of squares
Recall that
ŷ i = x1i + x ( the variables are in deviation forms)
1 2 2i
yi = ŷ i + Ui
Ui2 = (yi - ŷ i )2 = (yi - 1 x1i - 2 x 2i )2
or Ui2 = Ui .Ui = Ui(yi - 1 x1i - 2 x 2i )
= Ui .yi - 1 Ui .x1i - 2 Ui .x2i
The Adjusted R2
Note that as the number of regressors (explanatory variables) increases the coefficient of
multiple determinations will usually increase. To see this, recall the definition of R2
2
2
R =1–
U i
2
y i
Now yi2 is independent of the number of X variables in the model because it is simply (yi -
Y )2. The residual sum of squares (RSS), Ui2, however depends on the number of explanatory
variables present in the model. It is clear that as the number of X variables increases, Ui2 is
bound to decrease (at least it will not increase), hence R 2 will increase. Therefore, in comparing
two regression models with the same dependent variable but differing number of X variables,
one should be very wary of choosing the model with the highest R 2. An explanatory variable
which is not statistically significant may be retained in the model if one looks at R2 only.
Therefore, to correct for this defect we adjust R 2 by taking into account the degrees of freedom,
which clearly decrease as new repressors are introduced in the function
2
R
2
=1–
U n k i
2
y n 1 i
n 1
or R 2 = 1 – (1 – R2)
n k
where k = the number of parameters in the model (including the intercept term)
n = the number of sample observations
R2 = is the unadjusted multiple coefficient of determination
If n is large, R 2 and R2 will not differ much. But with small samples, if the number of
regressors (X’s) is large in relation to the sample observations, R 2 will be much smaller than
R2.
The principle involved in testing multiple regressions is identical with that of simple regression.
We can test whether a particular variable X1 or X2 is significant or not holding the other
variable constant. The t test is used to test a hypothesis about any individual partial regression
coefficient. The partial regression coefficient measures the change in the mean value of Y
E(Y/X2,X3), per unit change in X2, holding X3 constant
1 i
t= ~ t(n – k) (i = 0, 1, 2, …., k)
S ( i )
This is the observed (or sample) value of the t ratio, which we compare with the theoretical
value of t obtainable from the t-table with n – k degrees of freedom.
The theoretical values of t (at the chosen level of significance) are the critical values that define
the critical region in a two-tail test, with n – k degrees of freedom.
If the computed t value exceeds the critical t value at the chosen level of significance, we may
reject the hypothesis; otherwise, we may accept it ( 1 is not significant at the chosen level of
t 2
Assume = 0.05, = 2.179 for 12 df
Acceptance
region
95%
Critical
Critical
region 2.5%
region (2.5%)
For a number of degrees of freedom higher than 8 the critical value of t (at the 5% level of
significance) for the rejection of the null hypothesis is approximately 2.
The above joint hypothesis can be tested by the analysis of variance (AOV) technique. The
following table summarizes the idea.
k1
2
Due to Residual (RSS) U i
2
N–k U i
k
Total y 2
i N–1
(Total variation)
Therefore to undertake the test first find the calculated value of F and compare it with the F
tabulated. The calculated value of F can be obtained by using the following formula.
2
F=
yˆ k 1 ESS k 1 follows the F distribution with k – 1 and n – k df.
i
2
U n k RSS N k
i
F
0 1 2 3 4 5
When R2 = 0, F is zero. The larger the R2, the greater the F value. In the limit, when R2 = 1, F is
infinite. Thus the F test, which is a measure of the overall significance of the estimated
regression, is also a test of significance of R 2. Testing the null hypothesis is equivalent to
testing the null hypothesis that (the population) R2 is zero.
Example:
Example: Suppose we have data on wheat yield (y), amount of rainfall (x 2), and amount of
fertilizer applied (X1). It is assumed that the fluctuations in yield can be explained by varying
levels of rainfall and fertilizer.
Table 3.6.1
(1) (2) (3) (4) (5) (6) (7) (8) (9)
Yield Fertilizer Rain fall yi x1i x2i x1i yi x2i yi x1x2
(Y) (X1) (X2)
40 100 10
50 200 20
50 300 10
70 400 30
65 500 20
65 600 20
80 700 30
420 2800 140
Y = 60 X 1 = 400 X 2 = 20 (means)
1. Find the OLS estimators (i.e., o , 1 and 2 )
Solutions:
Solutions: The formula for o , 1 and 2 are
o = Y 1 X1 2 X 2
x y x x y x
1i i
2
2i 2i i 1i x 2i where x’s and y’s
1 = 2 are in deviation
x x x x
2
1i
2
2i 1i 2i forms
x 2i y i x12i
x y x 1i i 1i x 2i
2= 2
x x x x
2
1i
2
2i 1i 2i
Now find the deviations of the observations from their mean values. (Column 4 to 11 in the
above table)
The next step will be to insert the following values (in deviation) in to the above formula
2400000
= 0.0381
63000000
(600)(280,000) (16500)(7000) 168,000,000 115,500,000
2= (280,000)(400) (7000) 2
63,000,000
= 0.833
Now o =
Solution
ˆ u2 . x 22
ˆ 2 x 22
Var( 1 ) = 2 , Var ( 2 ) = 2
x x 2
1
2
2 x x 1 2
2
x x
1
2
2 x x
1 2
0.0576
5.6644
21.4286
21.4286
Hence u2 = = 5.3532
7 3
(5.3572)(400)
Var( 1 ) = = 0.000034
( 280,000)(400) (7000) 2
S( 1 ) = 0.000034 = 0.0058
(280,000)
Var( 2 ) = (5.3572) = 0.02381
63,000,000
S( 2 ) = 0.02381 = 0.1543
(0.0381)(16,500) (0.833)(600)
= = 0.98
1150
Interpretation: 98% of the variation in yield is due to the regression plane (i.e., because of
variation in the amount of fertilizer and rainfall). The model is a good fit.
ttabulated = t 0.05
2
(7 3) = 2.78- can be found from the statistical table (t-distribution)
Decision:
Decision: Since tcalculated > ttabulated , we reject H0.
That is 1 is statistically significant. The variable X1, fertilizer significantly affects yield.
ttabulated = t 0.05
2
(7 3) = 2.78
Decision:
Decision: Since tcalculated > ttabulated , we reject H0. 2 is statistically significant
0.0219< 1 <0.0542
Interpretation:
Interpretation: The value of the true population parameter 1 will lie between 0.0219 and
0.0542 in 95 out of 100 cases.
Note:
Note: The coefficient of X1 and X2 ( 1 and 2 ) measures the partial effect. For example 1
Let us now turn our attention to the problem of forecasting the value of the dependent variable
for a given set of values of the explanatory variables. Suppose the given values of the
explanatory variables be X01, X02, X03,…, X0k, and let the corresponding value of the dependent
variable be Y0. Now we are interested in forecasting Y0.
For three variable cases, the point forecast can be found as follows:
Yˆ0 = o 1 X 01 2 X 02
Example 1.
1. Consider the example in section 3.6. (Table 3.6.1)
Yˆ0 t/2 S Yˆ 0
where S Yˆ is the standard error of the forecast value and it can be found by using the following
0
formula
S Yˆ = S 1 X 0T ( X T X ) 1 X 0
0
2
where S = U
U TU
i
Y T Y T xT Y
n k n k n k
X 0T = [X01, X02, …, X0k]
Note:
Note: By order we mean the number of secondary subscripts
Interpretation:
Interpretation: Example r12.3 – holding X3 constant, there is a positive or negative association
between Y and X2.
A method of point estimation with some stronger theoretical properties than the method of OLS
in the method of maximum likelihood (ML)
1 1 Yi 0 1 X i 2
where f(Yi) = exp ………………(2)
2 2 2
Which is the density function of a normally distributed variable with the given mean and
variance.
1 1 Y 0 1 X i 2
f(Y1, Y2, …, Yn/ o + 1Xi, ) =2
n
exp i …………….(3)
n 2 2 2
If Y1, Y2, …. Yn are known or given, but o, 1 and 2 are not known, the function in (3) is
called a likelihood function, denoted by LF( o, 1, 2) , and written as
The method of maximum likelihood, as the name indicates, consists in estimating the unknown
parameters in such a manner that the probability of observing the given Y’s is as high (or
maximum) as possible. Therefore, we have to find the maximum of the function (4).
Using your knowledge of deferential calculus
n 1 (Yi 0 1 X i ) 2
lnLF = nln
nln - ln(2
ln(2) - ……………(5)
2 2 2
n 2n 1 (Yi 0 1 X i ) 2
= 2
ln
ln - ln(2
ln(2) - ………………(6)
2 2 2
Differentiating partially with respect to o, 1, and 2, setting the result to zero we obtain
ln LF 1
2 (Yi – o – 1Xi) (-1) = 0 ………………………(7)
0
ln LF 1
2 (Yi – o– 1Xi) (-Xi) = 0 ………………………(8)
1
ln LF n 1 2
4 (Yi – o – 1Xi) = 0 …………………..(9)
2
2
2 2
~ ~
The above equations can be rewritten as (letting 0 , 1 and ~ 2 denote the ML estimators)
1 ~ ~
~ 2 (Yi - 0 - 1 Xi) = 0 ……………………………………(10)
1 ~ ~
~ 2 (Yi - 0 - 1 Xi)Xi = 0 …………………………………(11)
n 1 ~ ~
~ 2
~ 4 (Yi - - Xi)2 = 0 ………………………….(12)
2 2 0 1
~ ~
After simplifying Yi = n 0 + 1 Xi ………………………………(13)
~ ~
YiXi = 0 Xi + 1 Xi2…………………………….(14)
Which are precisely the normal equations of the least squares theory obtained in unit 2.
~
Therefore, the ML estimators, the ' s are the same as the OLS estimators.
1 ~ ~ 1 ~ ~
From equation (12) ~ 2 = (Yi - 0 - 1 Xi)2 = (Yi - 0 - 1 Xi)2
n n
( n 2) U i ,
which was shown to be an unbiased estimator of 2. Thus, the ML estimator of 2 is biased. The
magnitude of this bias can be easily determined as follows:
1 n 2 2 2
E( ~ 2 ) = ( Uˆ i2 ) = = 2 – 2
n n n
Which shows that ~ 2 is biased downward (i.e., it underestimates the true 2) in small samples.
2 2
But notice that as n, the sample size, increases indefinitely, the second term above , the
n
bias factor, tends to be zero. Therefore, asymptotically (i.e., in a very large sample), ~ 2 is
unbiased too.
3.9 SUMMARY
3. Normality; Ui ~ N(0, u2 )
4. No serial correlation (serial independence of the U’s); Cov (Ui, Uj) = 0 for i j
5. Independence of Ui and Xi; Cov(Ui, X1i) = Cov (Ui, X2i) = 0
6. No collinearity between the X variables (No multicollinearity)
7. Correct specification of the model
Formulas for the parameters
o = Y 1 X1 2 X 2
x 2i y i x12i
x y x 1i i 1i x 2i
2= 2
x x x x
2
1i
2
2i 1i 2i
1 X 12 x 2 X 22 x 2 2 X X
2 2 1 1 2 x1 x 2
Var( o ) = ˆ u 2
n x 2
x 2
1 2 1 2 x x
2
2
Var( 1 ) = ˆ u
x 2
2
x x 2
1
2
2 x x 1 2
2
2
Var( 2 ) = ˆ u
x 1
2
x x 2
1
2
2 x x 1 2
U2
Where ˆ u2 , k being the total number of parameters that are estimated.
n k
x1 and x2 are in deviations form.
The multiple coefficient of determination (R2): measures the proportion of the variation in Y
explained by the variables X1 and X2 jointly.
2 2
R 2
=
yˆ i
2
Y i Y
=1–
U i
1
RSS
y. X1 X 2 2 2 2
y i Y i Y y i
TSS
The Adjusted R2
2
R
2
=1–
U n k i
or R 2 = 1 – (1 – R2)
n 1
2
n k
y n 1 i
H0: i = 0
H1: i 0 or one sided ( i > 0, i < 0)
Fcal =
yˆ k 1 ESS k 1 follows the F distribution with k – 1 and n – k df.
i
2
U n k RSS N k
i
Decision Rule:
Rule: If Fcalculated > Ftabulated (F(k – 1, N – k)), reject H0: otherwise you may accept it,
R 2 ( k 1)
F=
(1 R 2 ( N k )
Forecasting
Point forecast vs interval estimation (the forecasted value will lie on the interval (a, b))
Yˆ0 Y0
The 95% confidence interval for Y0 (forecasted value) can be given by making use of
S Yˆ
0
~t(n – k)
P(-t/2 < t < t/2) = 1 -
The method of maximum likelihood, as the name indicates, consists in estimating the unknown
parameters in such a manner that the probability of observing the given Y’s is as high (or
maximum) as possible.
~
The ML estimators, the ' s are the same as the OLS estimators.
B) Answer R2 = 0.894
Interpretation: The variables X1 and X2 explain 89% of the total variation in Y.
C) Answer Var( 1 ) = 6.53 Var ( 2 ) = 0.0001
S( 1 ) = 2.55, S( 2 ) = 0.01
D) Answer 0 and 1 are statistically significant
2 is not statistically significant
E) Answer (a) –13.22075 < 1 < -1.15925
(b) –0.00965 < 2 < 0.03765
1. The following table shows observations on quantity of oranges sold (y), price in cents
(X1), and advertising expenditures (X2)
Quantity Price Advertising expenditure
(Y) (X1) (X2)
55 100 5.5
70 90 6.3
2. The following results were obtained from a sample of 12 firms on their output (Y), labor
input (X1) and capital input (X2), measured in arbitrary units.
Y = 753 Y2 = 48,139 YX1 = 40, 830
X1 = 643 X12 = 34,843 YX2 = 6,796
X2 = 106 X22 = 976 X1X2 = 5,779
a) Find the least squares equation of Y on X1 and X2. What is the economic meaning of
your coefficients?
b) Given the following sample values of output (Y), compute the standard errors of the
estimates and test their statistical significance.
Firms A B C D E F G H I J K L
Output 64 71 53 67 55 58 77 57 56 51 76 68
c) Find the multiple correlation coefficient and the unexplained variation in output
d) Construct 99 percent confidence intervals for the population parameters.
3. From the following data estimate the partial regression coefficients, their standard errors, and
the adjusted and unadjusted R2 values.
(Yi - Y ) (X3i- X 3 ) = 4250.900 (X2i- X 2 )(X3i- X 3 ) = 4796.000
n =15
4. The following represents the true relationship between the independent variables
X1, X2, X3, and the dependent variable Y
Yi= bo+b1X1i+b2X2i+b3X3i+Ui
5. There are occasions when the two variable linear regression model assumes the following
form:
Yi=
Yi=Xi+Ei
Where is the parameter and E is the disturbance term. In this model the intercept
term is zero. The model is therefore known as regression through the origin.
X iYi
i)The least squares estimator 2
=
Xi
8 The quantity demanded of commodity is assumed to be a linear function of its price X. the
following results has been obtained from a sample of 10 observations.
Price in 15 13 12 12 9 7 7 4 6 3
Birr (x)
Quality in 760 775 780 785 790 795 800 810 830 840
kg
2
i) Show that Cov(b0,b1) = X u under basic assumptions of the linear regression model.
xi2
ii) Show that the estimated regression line passes through the mean values of X and Y.
13. Assume that the quantity supplied of a commodity Y is a linear function of its price (X1)
and wage rate of labor used (X2) in the production of the commodity. The sample values are
summarized as follows
Y=1,282 Y2=132,670 X1Y=53,666
X1=545 X21=22,922 X2Y=5,707
X2 =86 X22=617 X1X2=2,568
i) Using OLS estimate the parameters of the model
ii) Estimate the elasticities and interpret your results
iii) Forecast the supply for a particular commodity at X1=32 and X2=10, set a 95%
confidence interval for the forecasted value.
iv) Test the over all significance of the supply function
CONTENT
4.0 Aims and Objective
4.1 Introduction
4.2 Zero Expected Disturbances
4.3 Homoscedasticity
4.4 Autocorreletion
Introduction to Econometrics Page 93
4.5 Multicollinearity
4.6 Summary
4.7 Answers to Check Your Progress
4.8 Model Examination
4.9 Reference
The aim of this unit is to show the reader what is meant by violation of basic econometric assumption that formed
the basis of the classical linear regression model. After the student have completed this unit he/she will understand:
4.1 introduction
It can be shown that by using these observations we would get a bad estimate of the true line. If
the true line lies below or above the observations, the estimated line would be biased
The above figure shows that the estimated line Yˆ is not a good approximation to the true line,
E(Y)
Note that there is no test for the verification of this assumption because the assumption E(U) =
0 is forced upon us if we are to establish the true relationship. That is, we set E(U) = 0 at the
outset of our estimation procedure. Its plausibility should be examined in each particular case
on a priori grounds. In any econometric application we must be sure that the following things
are fulfilled so as to be safe from violating the assumption of E(U) = 0
i) All the important variables have been included into the function.
ii) There are no systematically positive or systematically negative errors of
measurement in the dependent variable.
around its zero mean does not depend on the value of X. That is ui2 f(Xi). Consider the
following diagram.
In figure (b) we picture the case of (monotonically) increasing variance of U i’s: as X increases,
so does the variance of U. This is a common form of hetrodcedasticity assumed in econometric
applications. That is, the larger an independent variable, the larger the variance of the
associated disturbance. Various examples can be stated in support of this argument. For
instance, if consumption is a function of the level of income, at higher levels of income (the
independent variable) there is greater scope for the consumer to act on whims and deviate by
larger amounts from the specified consumption relationship. The following diagram depicts this
case.
Cons to Econometrics
Introduction Page 96
Income
Low High
income income
Note, however, that the problem of hetroscedasticity is the problem of cross-sectional data
rather than time series data. That is, the problem is more serious on cross section data.
B) Causes of Hetroscedasticity
Hetrodcedasticity can also arise as a result of several cases. The first one is the presence of
outliers (i.e., extreme values compared to the majority of a variable). The inclusion or exclusion
of such an observation, especially if the sample size is small, can substantially alter the results
of regression analysis. With outliers it would be hard to maintain the assumption of
homoscedasticity.
Another source of hetrodcedasticity arises from violating the assumption that the regression
model is correctly specified. Very often what looks like hetroscedasticity may be due to the fact
that some important variables are omitted from the model. In such situation the residuals
obtained from the regression may give the distinct impression that the error variance may not
be constant. But if the omitted variables are included in the model, the impression may
disappear.
i) If U is hetroscedastic, the OLS estimates do not have the minimum variance property in
the class of unbiased estimators; that is, they are inefficient in small samples.
Furthermore, they are inefficient in large samples
ii) The coefficient estimates would still be statistically unbiased. That is the expected value
In figure (a) we see that there is no systematic relationship between the two variables,
suggesting that perhaps no hetrodcedasticity is present in the data. Figure (b) and (c) however,
suggests a linear relationship between the two variables particularly figure (c) reveals or
suggests that the hetroscedastic variance may be proportional to the value of Y or X. Figure (d)
and (e) indicate a quadratic relationship between Uˆ i2 and Yˆi or X. This knowledge may help
us in transforming our data in such a manner that in the regression on the transformed data the
variance of the disturbance is homoscedastic. Note that this visual inspection method is also
known as the informal method. The following tests follow formal method.
Since i2 is generally not known, park suggests using Uˆ i2 as a proxy and running the following
regression
ln Uˆ i2 = ln2 + lnXi + Vi
= α + lnXi + Vi .......................................(4.3)
If turns out to be statistically significant, it would suggest that hetrodcedasticity is present in
the data. If it turns out to be insignificant, we may accept the assumption of homoscedasticity.
The park test is thus a two-stage procedure. In the first stage we run the OLS regression
disregarding the hetroscedasticity question. We obtain Û i from this regression, and then in the
second stage we run the regression stated in (4.3)
Example:
Example: Consider a relationship between Compensation (Y) and Productivity (X). To
illustrate the Park approach, the following regression function is used
Yi = βo + β1Xi + Ui .......................................(4.4)
Suppose data on Y and X is used to come up with the following result
Ŷ = 1992.34 + 0.23 Xi
S.e= (936.48) (0.09)
t = (2.13) (2.33) r2 = 0.44
Suppose that the residuals obtained from the above regression, were regressed on Xi as
suggested in (4.3), giving the following results.
As shown in the above result (t value) the coefficient of lnXi is not significant. That is, there is
no statistically significant relationship between the two variables. Following the Park test, one
may conclude that there is no hetroscedasticity in the error variance.
Where di = difference in the rank assigned to two different characteristics of the i th individual or
phenomenon and n = number of individuals or phenomena ranked. The steps required in this
test is stated as follows
Assume Yi = 0 + 1Xi + Ui
Step 1. Fit the regression to the data on Y and X and obtain the residuals Û i
Step 2. Ignoring the sign of Û i , that is, taking their absolute value | Û i |, rank both | Û i
| and Xi (or Ŷi ) according to an ascending or descending order and compute the
spearman’s rank correlation coefficient given previously, (4.5).
Step 3. Assuming that the population rank correlation coefficient s is zero and n > 8,
the significance of the sample rs can be tested by the t test as follows:
rs n 2
t= ........................................... (4.6)
1 rs2
with df = n – 2
If the computed t value exceeds the critical t value, we may accept the hypothesis of
hetrodcedasticity; otherwise we may reject it. If the regression model involves more than one X
variable, rs can be computed between | Û i | and each of the X variable separately and can be
tested for statistical significance by the t-test given.
Example To illustrate the rank correlation test consider the regression Yi = β0 + β1Xi. Suppose
10 observations are used to this equation. The following table make use of the rank correlation
d (difference
Observ- Rank of
Y X Ŷ Û=(Y- Ŷ) Rank of Xi between the two d2
ation Ûi
ranking)
1 12.4 12.1 11.37 1.03 9 4 5 25
2 14.4 21.4 15.64 1.24 10 9 1 1
3 14.6 18.4 14.4 0.20 4 7 -3 9
4 16 21.7 15.78 0.22 5 10 -5 25
5 11.3 12.5 11.56 0.26 6 5 1 1
6 10.0 10.4 10.59 0.59 7 2 5 25
7 16.2 20.8 15.37 0.83 8 8 0 0
8 10.4 10.2 10.50 0.10 3 1 2 4
9 13.1 16.0 13.16 0.06 2 6 -4 16
10 11.3 12.0 11.33 0.03 1 3 -2 4
TOTAL 0 110
= 0.33
= 0.99
Note that for 8 (=10-2) df this t-value is not significant even at the 10% level of significance.
Thus, there is no evidence of systematic relationship between the explanatory variable and the
absolute value of the residuals, which might suggest that there is no hetroscedasticity.
Step I: the observations are ordered according to the magnitude of the independent variable
thought to be related to the variance of the disturbances.
Step II: a certain number of central observations (represented by c) are omitted, leaving two
equal-sized groups of observations, one group corresponding to low values of the chosen
independent variable and the other group corresponding to high values. Note that the
observations are omitted to sharpen or accentuate the difference between the small variance and
the large variance group.
Step III. we fit separate regression to each sub-sample, and we obtain the sum of squared
residuals from each of them and the ratio of their sum of squared residuals is formed. That is,
Uˆ i2 = residuals form the sub-sample of low values of X 1 with [(n-c)/2] – k degrees of
[(n-c)/2] – k
If each of these sums is divided by the appropriate degrees of freedom, we obtain estimates of
the variances of the Uˆ ' s in the two sub samples.
Step IV : Compute the ratio of the two variances given by
*
F =
U n c 2 k Uˆ
2
2
2
2
.........................................(4.7)
Uˆ n c 2 k Uˆ
1
2
1
2
has an F distribution (with numerator and denomenator each [{n-c-2k}/2] degrees of freedom,
where n = total number of observations, c = central observations omitted, k = number of
parameters estimated from each regression). If the two variances are the same (that is, if the
Uˆ ' s are homoscedasticc) the value of F * will tend to one. If the variance differ, F * will have a
large value (given that by the design of the test Û 22 > Û 12 . Generally, the observed F* is
compared with the theoretical value of F with (n-c-2k)/2 degrees of freedom (at a chosen level
Example:
Example: Suppose that we have data on consumption expenditure in relation to income for a
cross section of 30 families. Suppose we postulate that consumption expenditure is linearly
related to income but that hetoscedasticity is present in the data. Suppose further that the
middle 4 observations are dropped after the necessary reordering of the data. Suppose we
obtain the following result after we perform a separate regression based on the two 13
observations.
1536.8
F*
=
11
377.17
11
F* = 4.07
Note from the F- table in the appendix that the critical F value for 11 numerator and 11
denominator df at the 5% level is 2.82. Since the estimated F* value exceeds the critical value,
we may conclude that there is hetroscedasticity in the error variance.
Note, however, that the ability of the Goldfeld-Quadent test to perform successfully depends on
how c is chosen. Moreover, its success depends on identifying the correct X (i.e., independent)
variable with which to order the observations. This limitation of this test can be avoided if we
consider the Breusch-Pagan –Godfrey (BPG) test.
that is, i2 is some function of the non-stochastic variables Z’s. some or all of the X’s can serve
as Z’s. Specifically, assume that
i2 = 0 + 1Z1i + … + mZmi ..........................................(4.10)
Û n
Step2. Obtain ~ 2 Uˆ i2 n . Note that this is the maximum likelihood estimator of 2.
(Recall from unit two previous discussion that the OLS estimator i2 is Uˆ i
2
(n k )
Example:
Example: Suppose we have 30 observations data on Y and X that gave us the following
regression result.
Step 4 Assuming that Pi are linearly related to Xi (=Zi) we obtain the following
regression result.
Pi= -0.74 + 0.01Xi
ESS = 10.42
Step 5 = ½ (ESS) = 5.21
From the Chi Square table we find that for 1 df the 5% critical Chi square value is 3.84. Thus,
the observed Chi square value is significant at 5% level of significance.
Note that BPG test is asymptotic. That is, it is a large sample test. The test is sensitive in small
samples with regard to the assumption that the disturbances Vi are normally distributed.
Assumption one:
one: Given the model Yi = 0 + 1Xi + Ui
Suppose that we assume the error variance is proportional to Xi2. That is,
E(Ui2) = 2Xi2
If, as a matter of “speculation”, or graphical methods it is believed that the variance of Ui is
proportional to the square of the explanatory variable X. For example suppose that graphical
inspection provides the following result
It is believed that the variance of Ui is proportional to the square of the explanatory variable X,
one may transform the original model as follows. Divide the original through by Xi to obtain
Yi 0 U
1 i
Xi Xi Xi
1
= 0 + 1 + Vi ............................................... (4.11)
Xi
where Vi is the transformed disturbance term, equal to Ui/Xi. Now it is easy to verify that
2
U
E(V )i
2
= E i
Xi
1
2
E (U i2 )
Xi
Thus the variance of Vi is homoscedastic and one may proceed to apply OLS to the transformed
equation. Notice that in the transformed regression the intercept term 1 is the slope coefficient
in the original equation and the slope coefficient 0 is the intercept term in the original model.
Therefore, to get back to the original model we shall have to multiply the estimated (4.11) by Xi
Assumption two:
two: Given the model Yi = 0 + 1Xi + Ui suppose that we assume the error
variance to be proportional to Xi. That is,
E(Ui2) = 2Xi
This requires square root transformation.
For example if graphical inspection provides the following result, then it suggests that the
variance of Ui is proportional to Xi
In this case the original model can be transformed by dividing the model with Xi . That is,
Yi 0 X U
i i
Xi Xi Xi Xi
1
= E(Ui2)
Xi
Therefore, one may proceed to apply OLS to the transformed equation. Note an important
feature of the transformed model: It has no intercept term. Therefore, one will have to use the
regression through the origin model to estimate 0 and 1. Having run regression on the
transformed model (4.12) one can get back to the original model simply by multiplying it with
Xi
Assumption three:
three: A log transformation such as
lnYi = 0 + 1lnXi + Ui
Very often such transformation reduces hetrodcedasticity when compared with the regression
Yi = 0 + 1Xi + Ui
This result arises because log transformation compresses the scales in which the variables are
measured. For example log transformation reduces a ten-fold difference between two values
(such as between 8 and 80) into a two-fold difference (because ln 80 = 4.32 and ln 8 = 2.08)
To conclude, the remedial measures explained earlier through transformation point out that we
are essentially speculating about the nature of i2. Note, also that the OLS estimators obtained
from the transformed equation are BLUE. Which of the transformation discussed will work will
depend on the nature of the problem and the severity of hetroscedasticity. Moreover, we may
not know a priori which of the X variable should be chosen for transformation the data in case
of multiple regression model. In addition log transformation is not applicable if some of the Y
1. State with brief reason whether the following statements are true, false, or uncertain
a) In the presence of hetroscedasticity OLS estimators are biased as well as
inefficient
b) If hetroscedasticity is present, the conventional t and F tests are invalid
2. State three consequences of hetroscedasticity
3. List and explain the BPG test
4. Suppose that you have data of personal saving and personal income of Ethiopia for 31
year period. Assume that graphical inspection suggest that Ui's are hetroscedasticso so
that you wanted to employ the Gordfield Quandt test. Suppose you ordered the
observation in ascending order of income and omit the nine central observations.
Applying OLS to each subset, you obtained the following result.
a) For sub set I
Ŝ1 = -738.84 + 0.008Ii
Û 12 = 144,771.5
b) For Sub set II
Ŝ 2 = 1141.07 + 0.029I
Û 22 = 769,899.2
Is there any evidence of hetroscedasticity?
4.4 AUTOCORRELATION
A. The Nature of Autocorrelation
An important assumption of the classical linear model is that there is no autocorrelation or
serial correlation among the disturbances Ui entering into the population regression function.
This assumption implies that the covariance of Ui and Uj in equal to zero. That is:
Cov(UiUj) = E{[Ui – E(Ui)] [Uj – E (Uj)]
Since auto correlated errors arise most frequently in time series models, the discussion in the
rest of this unit is couched in terms of time series data.
There are a number of time-series patterns or process that can be used to model correlated
errors. The most common is what is known as “ the first order autoregressive process or AR(1)
process. Consider
Yt = 0 + 1Xt + Ut
where t denotes data or observation at time t (i.e., a time series data) with this one can assume that the disturbances
are generated as follows
Ut = Ut-1 + t
Where is known as the coefficient of auto covariance and where t is the stochastic such that
it satisfies the standard OLS assumptions, namely
E(
E(t) = 0
Var(t) = 2
Var(
Cov (
(t, t+s) = 0
where subscript ‘s’ represent the exact period of lag.
u2
= 2
u
where -1 < < 1
Hence, (rho) is simple correlation of the successive errors of the original model.
Note that when > 0 successive errors are positively correlated and when < 0 successive
errors are negatively correlated. It can be shown that corr(U t, Ut-s) = s (where s represents the
exact period of lag). It implies that the correlation (be it negative or positive) between any two
period diminishes as time goes by; i.e., as s increases
b) Consequences of Autocorrelation
When the disturbance term exhibits serial correlation the value as well as the standard errors of
the parameter estimates are affected.
Notice from the diagram that the OLS estimating line gives a better fit to the data than the true
relationship. This reveals why in this contest r2 is overestimated and u2 (and the variance of
OLS) is under estimated. When the standard error of ̂ ' s are biased down wards, it leads to
confidence intervals which are much narrow. Moreover, parameter estimate of irrelevant
explanatory variable may be highly significant. In other words, the figure reveals that the
estimated error term Û i are closer to the regression line than are the U’s to the true line and
thus we would have a serious underestimation of u2
Note that since the population disturbances Ut, cannot be observed directly, we use its proxy,
the residual Û t which can be obtained form the usual OLS procedure. The examination of Û t
can provide useful information not only about autocorrelation but also about hetrescedasticity,
model inadequacy, or specification bias.
i) Graphical Method
Some rough idea about the existence of autocorrelation may be gained by plotting the residuals
either against time or against their own lagged variables.
For instance, suppose plotting the residual against its lagged variable bring about the following
relationship.
Û t
Uˆ
t 1
As the above figure reveals most of the residuals are bunched in the first and the third
quadrants suggesting very strongly that there is positive correlation in the residuals. However,
Introduction to Econometrics Page 115
the graphical method we have just discussed is essentially subjective or qualitative in nature.
But there are quantitative tests that can be used to supplement the purely qualitative approach
which is simply the ratio of the sum of squared differences in successive residuals to the
residual sum of squares, RSS. Note that in the numerator of the d statistic the number of
observations is n-1 because one observation is lost in taking successive differences. Note that
expanding the above formula allows us to obtain
d = 2(1 - ̂ ). ......................................................... (4.14)
Although it is not used routinely, it is important to note the assumptions underlying the d-statistics
a) the regression model includes an intercept term
b) the explanatory variables are non-stochastic or fixed in repeated sampling
c) the disturbances Ut are generated by the first order autoregressive scheme.
Ut = Ut-1 + t
d) the regression model does not include lagged value(s) of the dependent variable as one
of the explanatory variables
e) there are no missing observations in the data
Note from the Durbin-Watson statistic that for positive autocorrelation (
( > 0), successive
disturbance values will tend to have the same sign and the quantities (U t – Ut-1)2 will tend to be
small relative to the squares of the actual values of the disturbances. We can therefore, expect
the value of the expression in equation (4.13) to be low. Indeed, for the extreme case = 1 it is
possible that Ut = Ut-1 for all t so that the minimum possible value of the equation is zero.
However, for negative autocorrelation, since positive disturbance values now tend to be
followed by negative ones and vise versa, the quantities (Ut – Ut-1)2 will tend to be large relative
to the squares of the U’s. Hence, the value of (4.13) now tends to be high. The extreme case
here is when = 0 we should expect the expression (4.14) to take a value in the neighborhood
The Durbin-Watson test tests the hypothesis that H 0: = 0 (implying that the error terms are not
autocorrelated with a first order scheme) against the alternate. However, the sampling
distribution for the d-statistic depends on the sample size n, the number of explanatory
variables k and also on the actual sample values of the explanatory variables. Thus, the critical
values at which we might, for example reject the null hypothesis at 5 percent level of
significance depend very much on the sample we have chosen. Notice that it is impracticable
to tabulate critical values for all possible sets of sample values. What is possible however, is for
given values of n and k, to find upper and lower bounds such that actual critical values for any
set of sample values will fall within these known limits. Tables are available which give these
upper and lower bounds for various levels of n and k and for specified levels of significance.(In
the appendices part you can get the Durbin Watson table)
The Durbin-Watson test procedure in testing the null hypothesis of = 0 against the alternative
hypothesis of positive autocorrelation is illustrated in the figure below.
Note that under the null hypothesis the actual sampling distribution of d, for the given n and k
and for the given sample X values is shown by the unbroken curve. It is such that 5 percent of
the area beneath it lies to the left of the point d *, i.e., P(d < d*) = 0.05. If d* were known we
would reject the null hypothesis at the 5 percent level of significance if for our sample d < d *.
Unfortunately, for the reason given above, d* is unknown. The broken curve labeled DL and du
represent for given values of n and k, the upper and lower limits to the sampling distribution of
d with in which the actual sampling distribution must lie whatever the sample x-values.
du
d
dL
Note that tables for d*U and d*L are constructed to facilitate the use of one-tail rather than two
tail tests. The following representation explains better the actual test procedure which shows
that the limit of d are 0 and 4.
Note:
Note:
H0: No positive autocorrelation
HIntroduction
*
to Econometrics
0 : No Negative autocorrelation Page 118
Reject H0 Zone of Zone of Reject H0
Eviden indecision indecision Eviden
Do not reject H0 or H*
ce of or both ce of
d
0 positive dL dU 2 4-dU 4-dL negativ 4
Note that from the above presentation we can develop the following rule of thumb. That is, if d
is found to be closer to 2 in an application, one may assume that there is no first order
autocorrelation either positive or negative. If d is closer to 0 it is because is closer to 1
indicating strong positive autocorrelation in the residuals. Similarly the closer d is to 4, the
greater the evidence of negative serial correlation. This is because is closer to –1.
d) Remedial Measure
Since in the presence of serial correlation the OLS estimators are inefficient, it is essential to
seek remedial measure.
If the source of the problem is suspected to be due to omission of important variables, the
solution is to include those omitted variables. Besides if the source of the problem is believed
to be the result of misspecification of the model, then the solution is to determine the
appropriate mathematical form.
If the above approaches are ruled out, the appropriate procedure will be to transform the
original data so that we can comeup with a new form (or model) which satisfies the assumption
of no serial correlation. Of course, the transformation depends on the nature of the serial
Step 2:
2: Using the estimated residuals, run the following regression
Û t = ̂Uˆ t 1 Vt
Step 3:
3: Using ̂ obtained from step 2 regression, run the generalized difference equation
similar to (4.20) as follows
(Yt - ̂ Yt-1) = 0 (1- ̂ ) + 1(Xt - ̂ Xt-1) + (Ut - ̂ Ut-1)
or Y*t = *0 + *1X*t + Uˆ t*
Step 4:
4: Since a priori it is not known that the ̂ obtained from the regression in step 2 is the
best estimate of , substitute the values of ̂ 0* and ̂ 1* obtained from the regression in
step 3 into the original regression (4.21) and obtain the new residuals, say Uˆ t** as
Uˆ t** = Yt - ̂ 0* - ̂ 1* Xt
Note that this can be easily computed since Yt, Xt, ̂ 0* and ̂ 2* are all known.
Step 5:
5: Now estimate this regression
ˆˆU
Uˆ t** = ˆ ** + Wt
t 1
4.5 MULTICOLLINEARITY
a) The nature of the problem
One of the assumption of the classical linear regression model (CLRM) is that there is no perfect multicollinearity
among the regressors included in the regression model. Note that although the assumption is said to be violated
only in the case of exact multicollinearity (i.e., an exact linear relationship among some of the regressors), the
presence of multicollinearity (an approximate linear relationship among some of the regressors) lead to estimating
problems important enough to warrant out treating it as a violation of the classical linear regression model.
Multicollinearity does not depend on any theoretical or actual linear relationship among any of
the regressors; it depends on the existence of an approximate linear relationship in the data set
at hand. Unlike most other estimating problems, this problem is caused by the particular sample
available. Multicollinearity in the data could arise for several reasons. For example, the
independent variables may all share a common time trend, one independent variable might be
the lagged value of another that follows a trend, some independent variable may have varied
together because the data were not collected from a wide enough base, or there could in fact
exist some kind of approximate relationship among some of the regressors.
Note that the existence of multicollinearity will affect seriously the parameter estimates.
Intuitively, when any two explanatory variables are changing in nearly the same way, it
becomes extremely difficult to establish the influence of each one regressors on the dependent
variable separately. That is, if two explanatory variables change by the same proportion, the
influence on the dependent variable by one of the explanatory variables may be erroneously
attributed to the other. Their effect cannot be sensibly investigated, due to the high inter
correlation.
b) Consequences of Multicollinearity
In the case of near or high multicollinearity, one is likely to encounter the following
consequences
i) Although BLUE, the OLS estimators have large variances and covariances, making
precise estimation difficult. This is clearly seen through the formula of variance of the
estimators. For example of multiple linear regression, Var( ̂ 1 ) can be written as follows
2
Var( ̂ 1 ) =
x12i 1 r122
It is apparent from the above formula that as r12 (which is the coefficient of correlation
between X1 and X2) tends towards 1, that is as collinearity increases, the variance of the
estimator increases. The same holds for Var( ˆ 2 )and the cov ( ̂ 1 , ˆ 2 )
ii) Because of consequence (i), the confidence interval tend to be much wider, leading to the
acceptance of the “Zero null hypothesis” (i.e., the true population coefficient is zero).
iii) Because of consequence (i), the t-ratio of one or more coefficient’s tend to be statistically
insignificant.
iv) Although the t-ratio of one or more coefficients is statistically insignificant, R 2, the overall
measure of goodness of fit, can be very high. This is the basic symptom of the problem.
v) The OLS estimators and their standard errors can be sensitive to small changes in the data.
That is when few observations are included, the pattern of relationship may change and
affect the result.
vi) Forecasting is still possible if the nature of the collinearity remains the same within the
new (future) sample observation. That is, if collinearity exists on the data of the past 15
years sample, and if collinearity is expected to be the same for the future sample period,
then forecasting will not be a problem.
i) High R2 but few significant t-ratios: If R 2 is high, say in excess of 0.8, the F-test in most
cases will reject the hypothesis that the partial slope coefficients are simultaneiously equal
to zero. But the individual t tests will show that none of very few of the partial slope
coefficients are statistically different from zero.
ii) High pair-wise correlation among regressors. If the pair-wise correlation coefficient
among two regressors is high, say in excess of 0.8, then multicolinearity is a serious
problem.
iii) Auxiliary Regression: - Since multicollinearity arises because one or more of the
regressors are exact or approximately linear combinations of the other regressors, one way
of finding out which X variable is related to other X variables is to regress each Xi on the
remaining X variables and compute the corresponding R2that will help to decide abut the
problem. For example, consider the following auxiliary regression :
Xk = 1X1 + 2X2 + … + k-1Xk-1 + V
If the R2 of the above regression is high it implies that Xk is highly correlated with the rest
of the explanatory variables and hence drop Xk from the model.
d) Remedial Measures
The existence of multicolinearity in a data set doesnot necessarily mean that the coefficient
estimators in which the researcher is interested have unacceptably high variance. Thus, the
econometrician should not worry about multicollinearity if the R 2 from the regression exceeds
the R2 of any independent variable regressed on the other independent variables”. Moreover the
researcher should worry about multicollinearity if the t-statistics are all greater than 2. Because
multicollinearity is essentially a sample problem there are no infallible guides. However one
Now as the sample size increases, x1i2 will generally increases. Thus for any given r 12, the
variance of ̂ 1 will decrease, thus decreasing the standard error, which will enable us to
estimate 1 more precisely.
b) Drop a variable:
variable: - when faced with severe multicollinearity, one of the “simplest” thing to
do is to drop one of the collinear variables. But note that in dropping a variable from the
model we may be committing a specification bias or specification error. Specification bias
arises from incorrect specification of the model used in the analysis. Thus, if economic
theory requires some variables to be included in the model, dropping one of the variables
due to multicollinearity problem would constitute specification bias. This is because we
are dropping a variable when its true coefficient in the equation being estimated is not
zero.
c) Transformation of variables:
variables: - In time series analysis, one reason for high
multicollinearity between two variables is that over time both variables tend to move in
the same direction. One way of minimizing then dependence is to transform the variables.
That is, suppose Yt = 0 + 1X1t + 2X2t.
This relation must also hold at time t-1 because the origin of time is arbitrary anyway.
Therefore we have
Yt-1 = 0 + 1X1t-1 + 2X2t-1 + Ut-1.
Subtracting this from the above gives
Yt – Yt-1 = 1(X1t – X1t-1) + 2(X2t – X2t-1) + Vt
This is known as the first difference form because we run the regression, not on the original
variables, but on the difference of successive values of the variables. The first difference
regression model often reduces the severity of multicollinearity because, although the levels of
1. State with reasons whether the following statements are true, false or uncertain
a) Despite perfect multicollinearity, OLS estimators are BLUE
b) If an auxiliary regression shows that a particular R 2 is high, there is definite
evidence of high collinearity.
2. In data involving economic time series such as GDP, income, prices, unemployment, etc.
multicollinearity is usually suspected. Why?
3. State three remedial measure if multicollinearity is detected
4.6 SUMMARY
- In the presence of hetroscedasticity, the variance of OLS estimators are not provided by the
usual OLS formulas. But if we persist in using the usual OLS formula, the t and F tests
based on them can be highly misleading, resulting in erroneous conclusions.
- Autocorrelation can arise for several reasons that make OLS estimators to be inefficient.
The remedy depends on the nature of the interdependence among the disturbances, Ut
- Multicollinearity is a question of degree and not of a kind. Although there are no sure
methods of detecting collinearity, there are several indicators of it.
1 a) False. Because though OLS estimates are inefficient with the presence of
hetroscedasticity, they would still be statistically unbiased.
Uˆ
2
2 769,899.2
4. F* = 2
5
Uˆ 1
144,771.6
(Uˆ Uˆ
t t 1 )2 537,192
4. d* = = 573,069 0.937
Uˆ t
2
From the Durbin - Watson table, with 5 percent level of significance, n = 20 and K=1, we find
that d2 = 1.20 and du = 1.41. Since d* = 0.937 is less that d2 = 1.20, we conclude that there is
positive autocorrelation in the import function.
Answer To Check Your Progress 3
2. This is because the variables are highly interrelated. For example, an increase in income
brings about an increase in GDP. Moreover, an increase in unemployment usually brings
about a decline in prices.
3. Refer the text for the answer
Carrying out the Goldfeld Quandt test of hetroscedasticity at the 5% level of significance.
Content
5.0 Aims and Objectives
5.1 Introduction
5.2 Models with Binary Regressors
5.3 Non-Linear Regression Models
5.3.1 Non-Linear Relationships in Economics
5.3.2 Specification and Estimation of Non-Linear Models
5.3.2.1 Polynomials
5.3.2.2 Log-log Models
5.3.2.3 Semi-log Models
5.3.2.4 Reciprocal Model
5.4 Summary
5.5 Answers to Check Your Progress
5.6 References
5.7 Model Examination Questions
This unit aims at introducing models with binary explanatory variable(s) and specification and
estimation of non-linear models.
5.1 INTRODUCTION
As it is mentioned in the previous section, this unit is dealing with the role of qualitative
explanatory variables in regression analysis and the functional forms of some non-linear
regressor models. It will be shown that the introduction of qualitative variables, often called
Example.
Example. If an individual is male = 1
female = 0
Variables which assume such 0 and 1 values are called dummy variables or binary variables or
qualitative variables or categorical variables or dichotomous variables.
Now let us take some examples with a single quantitative explanatory variable and two or more
qualitative explanatory variables.
Example 1:
1: Suppose a researcher wants to find out whether sex makes any difference in a
college teacher’s salary, assuming that all other variables such as age, education level,
experience etc are held constant.
Consider the following hypothetical data on satisfying salaries of college teachers by sex
Starting salary Sex
(Y) (1 = male, 0 = female)
22,000 1
19,000 0
18,000 0
21,700 1
18,500 0
21,000 1
20,500 1
17,000 0
17,500 0
21,200 1
Since 1 is statistically significant, the results indicate that the mean salaries of the two
categories are different, actually the female teacher’s average salary is lower than her male
counter part. If all other variables are held constant, there is sex discrimination in the salaries of
the two sexes.
Salary
(0+1)
= 21,800
= 3,280
0 = 18,000
Example 2:
2: Let us consider a regression with quantitative and qualitative explanatory
variables. Let us include one quantitative explanatory variable on the model given in example 1
above
Yi = 0 + 1 Di + 2 Xi + Ui
The female teacher is known as the base category since it is assigned the value of 0.
Note that the assignment of 1 and 0 values to two categories, such as male and female, is
arbitrary in the sense that in our example we could have assigned D = 1 for female and D = 0
for male. But in interpreting the results of the models which use the dummy variables it is
critical to know how the 1 and 0 values are assigned.
The coefficient 0 (intercept) is the intercept term for the base category. The coefficient 1
attached to the dummy variable D can be called the differential intercept coefficient because it
tells by how much the value of the intercept term of the category that receives the value of 1
differs from the intercept coefficient of the base category.
The other important point is on the number of dummy variables to be included in the model. If
a qualitative variable has m categories, introduce only m–1 dummy variables. In the above
examples, sex has two categories, and hence we introduced only a single dummy variable. If
this rule is not followed, we shall fall in to what might be called the dummy-variable trap, that
is, the situation of perfect multicollinearity.
Example 3:
3: Let us take an example on regression on one quantitative variable and one
qualitative variable with more than two classes. Suppose we want to regress the annual
expenditure on health care by an individual on the income and education of the individual. Now
the variable education is qualitative in nature. We can have, as an example, three mutually
exclusive levels of education.
- Less than high school
- High school
- College
The number of dummies = 3 – 1 = 2. (Note the rule)
Let us consider the “less than high school education” category as the base category. The model
can be formulated as follows:
Yi = 0 + 1 D1i + 2 D2i + 3 Xi + Ui
where Yi = annual expenditure on health care
College education
High school education
Less than high school (Base category)
education
2
1
0
X (income)
Figure 5.2: Expenditure on health care in relation to income for three levels of education
The intercept 0 is the intercept of the base category. The differential intercepts 1 and 2
tells by how much the intercepts of the other two categories differ from the intercept of the base
category.
The technique of dummy variable can be easily extended to handle more than one qualitative
variable. If you consider example 1 above it is possible to introduce another dummy variable,
for example, color of the teacher, as an explanatory variable. Hence we will have an additional
dummy variable for color i.e.
D2 = 1 if white and 0 otherwise
Therefore, it is possible to include more than one quantitative variable and more than two
qualitative variable in our linear regression model.
The purpose of this section is to introduce you with models that are linear in the parameters but
non linear in the variables.
The assumption of linear relationship between the dependent and the explanatory variables may
not be acceptable for many economic relationships. Given the complexity of the real world we
expect non-linearities in most economic relationships.
Example 1:
1: Cost functions are usually non-linear
ATC ATC
Example 2:
2: Production functions
Product
(Y) TP
Input
Other economic functions like demand, supply, income-consumption curves, etc can also be
non-linear.
1: Y = 0 + 1 X1 + 2 X12 + 3 X13 + … + U
Example 1:
If we consider the U-shaped average cost curve
C = 0 + 1 X - 2 X2 + 3 X3 +U
Where, C = total cost; X = output
To fit this model we need to transform some of the variables
Example 2.
2. Suppose we have data on yield of wheat and amount of fertilizer applied. Assume
that the increased amount of fertilizer begin to burn the crop causing the yield to decline.
Y X X2
55 1 1
70 2 4
75 3 9
65 4 16
60 5 25
we want to fit the second degree equation
Yi = 0 + 1 X1i + 2 X1i2 + Ui
Let X 1i2 = W
Then Yi = 0 + 1 X1i + 2 W + Ui
This is linear both in terms of parameters and variables. We apply OLS to the above function.
The results are presented as follows:
Yˆi = 36 + 24.07Xi – 3.9 X i
2
(6.471) (1.059)
t= 3.72 -3.71
t0.05(5-3) = 2.92
Example 1:
1: 1. lnYi = 0 + 1 Xi + Ui
2. Yi = 0 + 1 lnXi + Ui
The above models are called semilog models. We call the first model log-lin model and the
second model is known as lin-log model. The name given to the above models is based on
whether the dependent variable or the explanatory variable is in the log form.
Multiplying the relative change in Y by 100 will give you the percentage change in Y for an
absolute change in X.
Example: ˆ P = 6.96 + 0.027T
Example: ln GN t
Where GNP = real gross
Introduction to Econometrics national product Page 140
T – time (in years)
(0.015) (0.012)
r2 = 0.95
F1.13 = 260.34
The above result shows that the real GNP of the country was growing at the rate of 2.70 percent
per year (for the sample period). It is possible to estimate a linear trend model
ˆ P = 1040.11 + 35 T
GN t
(18.9) (2.07)
r2 = 0.95
F1.13 = 284.7
This model implies that for the sample period the real GNP was growing at the constant
absolute amount of about $35 billion a year. The choice between the log-lin and linear model
will depend up on whether one is interested in the relative or the absolute change in the GNP.
NB: you can not compare the r2 values of the two models since the dependent variables are
different.
the variable X because it enters inversely or reciprocally, the model is linear in 0 and 1 is
therefore a linear regression model. The method of OLS can be applied to estimate the model.
1
If we let x = Z, the model becomes
i
Y approaches the limiting or asymptotic value 0 . Some examples are shown below.
Y Y Y
x
0 0
x
0 (a x - (b 0 1 (c)
0
) ) 0
1 b1
Figure 5.4: the reciprocal model Yi = 0 + 1
x b0
We can have examples for each of the above functions (fig. a, b and c)
1. The average fixed cost curve relates the average fixed cost of production to the level of
output. As it is indicated in fig. (a) the AFC declines continuously as output increases.
2. The Philips curve which relates the unemployment rate with the rate of inflation can be a
good example for fig (b) above
3. The reciprocal model of fig (c) is appropriate Engel expenditure curve that relates a
consumer’s expenditure on a commodity to his total expenditure or income.
5.4 SUMMARY
Dummy variables
Variables, which assume such 0 and 1 values, are called dummy variables
Binary variables.
Qualitative variables
Categorical variables
Dichotomous variables.
-Reciprocal Models
The functions defined as
1
Yi = 0 + + Ui is known as a reciprocal model.
xi
5.2.1
1 a) Yi = 0 + 1 Xi + U
5.6 REFERENCES
3. The following table gives data on annual percentage change in wage rates(Y) and the
unemployment rate (X) for a country for the period 1950 – 1966.
Percentage increase Unemployment (%)
Contents
6.0 Aims and Objective
6.1 Introduction
6.2. Simultaneous
Simultaneous Dependence of Economic Variables
6.3 Identification Problem
6.4 Test of Simultaneity
6.5 Approaches to Estimation
6.6 Summary
6.7Answer to Check Your Progress
6.8 Model Examination
6.9 Summary
The purpose of this unit is to introduce the student very briefly about the concept of
simultaneous dependence of economic variables. Thus, when the student have completed this
unit he/she will:
understand the concept of simultaneous equation
distinguish between endogenous and exogenous variables in a model
be able to derive reduced form equation from structural equations
6.1 INTRODUCTION
The application of least squares to a single equation assumes, among others, that the
explanatory variables are truly exogenous, that there is one-way causation between the
dependent variable (Y) and the explanatory variables (X). That is, the function cannot be
treated in isolation as a single equation model but belongs to a wider system of equations which
describes the relationship among all the relevant variables. In such cases we must use a multi
equation model which would include separate equations in which y and x would appear as
endogenous variables. A system describing the joint dependence of variables is called a system
of simultaneous equations.
In a single equations discussed in the previous units the cause and effect relationship is
unidirectional where the explanatory variables are the cause and the dependent variable is the
effect.
However, there are situations where there is a two-way flow of influence among economic
variables; that is, one economic variable affects another economic variable(s) and is, in turn,
affected by it (them). In such case we need to consider two equations and thus come up with
simultaneous equation models in which there is more than one regression equations for each
independent variable.
The first thing we need to answer is the question of “what happens if the parameters of each
equation are estimated by applying, say, the method of OLS, disregarding other equations in the
system? Recall that one of the crucial assumptions of the method of OLS is that the explanatory
X variables are either non stochastic or if stochastic (random are distributed independently of
the stochastic distribution term. If neither of these conditions is met, then, the least-squares
estimators are not only biased but also inconsistent; that is, as the sample size increases
indefinitely, the estimators do not converge to their true (population) values.
Example.
Example. Recall that price of a commodity and the quantity (bought and sold) are determined
by the intersection of the demand and supply curves for that commodity. Consider the
following linear demand and supply models.
d
Demand function Qt = 0 + 1Pt + U1t ……………………………...(6.3)
s
Supply function Qt = 0 + 1Pt + U2t…………………………………(6.4)
Where Qtd = Quantity demanded, Qts = Quantity supplied, P = price and t = time
Note that P and Q are jointly dependent variables. If U 1 changes because of changes in other
variables affecting Qtd (such as income and tastes) the demand shifts. Recall that such shift in
demand changes both P and Q. Similarly, a change in U 2t (because of changes in weather and
the like) will shift (affect) supply, again affecting both P and Q. Because of this simultaneous
dependence between Q and P, U1 and Pt in (6.3) and U2t and Pt is (6.4) cannot be independent.
Therefore a regression of Q on P as in (6.3) would violate an important assumption of the
classical linear regression model, namely, the assumption of no correlation between the
explanatory variable(s) and the disturbance term. In summary, the above discussion reveals that
in contrast to single equation models, in simultaneous equation models more than one
dependent, or endogenous, variable is involved, necessitating as many equations as the number
of endogenous variables. As a consequence such an endogenous explanatory variable becomes
stochastic and is usually correlated with the disturbance term of the equation in which it
appears as an explanatory variable.
Both equation 6.6 and 6.7 are structural or behavioral equations because they are portraying the
structure of an economy, where equation (6.7) being an identity. The ’s are known as the
structural parameters or coefficients. From the structural equations one can solve for the
endogenous variables and derive a reduced-form equations and the associated reduced form
If equation (6.6) is substituted into equation (6.7), and solve for Y we obtain the following
0 1 Ut
Yt = + It +
1 1 1 1 1 1
= 0 + 1It + Wt ……………………………..(6.8)
0 1 Ut
where 0 = , 1 = and Wt =
1 1 1 1 1 1
The reduced form coefficients, (the ’s) are also known as impact, or short run multipliers,
because they measure the immediate impact on the endogenous variable of a unit change in the
value of the exogenous variable. If in the preceding Keynesian model the investment
expenditure (I) is increased by, say $1 and if the marginal propensity to consume (i.e., 1) is
1
assumed to be 0.8, then from 1 of (6.8) we obtain 1 = = 5. This result means that
1 0 .8
increasing the investment by $1 will immediately (i.e., in the current time period) lead to an
increase in income of $5, that is, a fire fold increase.
Notice an interesting feature of the reduced-form equations. Since only the predetermined
variables and stochastic disturbances appear on the right side of these equations, and since the
predetermined variables are assumed to be uncorrelated with the disturbance terms, the OLS
method can be applied to estimate the coefficients of the reduced-form equations (the ’s). This
will be the case if a researcher is only interested in predicting the endogenous variables, only
wishes to estimate the size of the multipliers (i.e. the ’s)
Note that the identification problem is a mathematical (as opposed to statistical) problem
associated with simultaneous equation systems. It is concerned with the equation of the
possibility or impossibility of obtaining meaningful estimates of the structural parameters.
An identified equation may be either exactly (or fully or just) identified or over identified. It is
said to be over identified if more than one numerical value can be obtained for some of the
parameters of the structural equations. The circumstances under which each of these cases
occurs will be shown in the following discussion.
a) Under Identification
Consider the demand-and-supply model (6.3) and (6.4), together with the market clearing or
equilibrium, condition (6.5) that demand is equal to supply. By the equilibrium condition (i.e.,
Qtd = Qts ) we obtain,
U 2t U 1t
V1 =
1 1
Substituting Pt from (6.1) into (6.3) or (6.4) we obtain the following equilibrium quantity:
Qt = 1 + Wt ……………………………..(6.12)
1 0 0 1
where 1 =
1 1
1U 2t 1U 1t
Wt =
1 1
Note that 0 and 1, (the reduced-form-coefficients) contain all four structural parameters; 0,
1, 0 and 1. But, there is no way in which the four structural unknowns can be estimated from
only two reduced form coefficients. Recall from high school algebra that to estimate four
unknowns we must have four (independent) equations, and in general, to estimate k unknowns
we must have R (independent) equations. What all this means is that, given time series data on
p(price) and Q(quantity) and no other information, there is no way the researcher guarantee
whether he/she is estimating the demand function or the supply function. That is, a given P t and
Qt represent simply the point of intersection of the appropriate demand and supply curves
because of the equilibrium condition that demand is equal to supply.
2 U 2t U 1t
2 = , Vt =
1 1 1 1
Substituting the equilibrium price (6.16) into the demand or supply equation of (6.13) or (6.14)
we obtain the corresponding equilibrium quantity:
Qt = 3 + 4It + sPt-1 + Wt …......................................……..(6.17)
where the reduced-form coefficients are
1 0 0 1 2 1
3 = , 4 =
1 1 1 1
1 2 1U 2t 1U 1t
5 = , Wt =
1 1 1 1
the demand-and-supply model given in equations (6.13) and (6.14) contain six structural
coefficients 0, 1, 2, 0, 1, and 2 – and there are six reduced form coefficients - 0, 1, 2,
3, 4 and 5 – to estimate them. Thus, we have six equations in six unknowns, and normally we
should be able to obtain unique estimates. Therefore, the parameters of both the demand and
supply equations can be identified and the system as a whole can be identified.
c) Over identification
Note that for certain goods and services, wealth of the consumer is another important
determinant of demand. Therefore, the demand function (6.13) can be modified as follows,
keeping the supply function as before:
Demand function: Qt = 0 + 1Pt + 2It + 3Rt + U1t ……………….(6.18)
Supply function: Qt = 0 + 1Pt + 2Pt-1 + U2t ……………………….(6.19)
where R represents wealth
Equating demand to supply, we obtain the following equilibrium price and quantity
Pt = 0 + 1It + 2Rt + 3Pt-1 + Vt ……………………………..….. (6.20)
Qt = 4 + sIt + 6Rt + 7Pt-1 + Wt ……………………………….... (6.21)
Notice that the situation is the opposite of the case of under identification where there is too
little information. The only way in which the structural parameters of unidentified (or under
In a simple example such as the forgoing, it is easy to check for identification; in more
complicated systems, however, it is not so easy. However this time consuming procedure can
be avoided by resorting to either the orders condition or the rank condition of identification.
Although the order condition is easy to apply, it provides only a necessary condition for
identification. On the other hand the rank condition is both a necessary and sufficient condition
for identification. [Note: the order and rank conditions for identification will not be discussed
since the objective of this unit is to briefly introduce and inform the reader about simultaneous
equation. For detailed and advanced discussion readers can refer the reference list stated at the
end of this unit].
Where P̂t are estimated Pt, and Vˆt are estimated residuals. Substituting (6.27) into (6.23) we
Now under the null hypothesis that there is no simultaneity, the correlation between Vˆt and U2t
should be zero, asymptotically. Thus if we ran the regression (6.28) and find that the coefficient
of Vt in (6.28) is statistically zero, we can conclude that there is no simultaneity problem.
At the outset it may be noted that the estimation problem is rather complex because there are a
variety of estimation techniques with varying statistical properties. In view of the introductory
nature of this unit we shall consider very briefly the following techniques.
____________________________________________________________________________
____________________________________________________________________________
____________________________________________________________________________
____________________________________________________________________________
____________________________________________________________________________
____________________________________________________________________________
____________________________________________________________________________
____________________________________________________________________________
____________________________________________________________________________
It = 0 + 1Yt + Ut
Y t = Q t + It
Write the reduced form equation expressed in the form of Yt and It
We have seen that a unique feature of a simultaneous equation model is that the endogenous
variable in one equation may appear as an explanatory variable in other equation of the system
so that OLS method may not be applied. The identification problem in this regard asks whether
one can obtain a unique numerical estimates of the structural coefficients from the estimated
reduced form coefficients. This leads to the issue of just identified, under identified and over
identified equations. Note also that in the presence of simultaneity, OLS is generally not
applicable. None the less, it is imperative to test for simultaneity explicitly. For this purpose the
houseman specification test can be used. There are several methods of estimating a
simultaneous equation model.
1
I t
0 1 It
1 1 1 1 1 1
Ut
1. What is the economic meaning of the imposition of “Zero restriction on the parameters of a
Model?
2. Consider the following extended Keynesian model of income determination.
Ct = 0 + 1Yt + 2It + U1t
It = 0 + 1Yt-1 + U2t
Tt = 0 + 1Yt + U3t
Y1t = 4 + 8X1t
Y2t = 2 + 12X1t
a) Which structural coefficients, if any, can be estimated from the reduced form
coefficients?
b) Show that the reduced form parameters measure the total effect of a change in the
exogenous variables.
Contents
7.0 Aims and Objective
7.1 Introduction
7.2 Qualitative Response Models
7.2.1 Categories of Qualitative Response Model
7.3 The Linear Probability Model (LPM)
7.4 The Logit Model
7.5 The Probit Model
7.6 The Tobit Model
7.7 Summary
7.8 Answers to Check Your Progress Questions
7.9 References
7.10 Model Examination Questions
The purpose of this unit is to familiarize students with the concept of qualitative dependent
variable in a regression model and the estimating problems associated with such models.
7.1 INTRODUCTION
Example 1.
1. Y = 0 + 1X1 + 2X2
Y = 1, if individual i attended college
= 0, otherwise
In the above example the dependent variable Y takes on only two values (i.e., 0 and 1).
Conventional regression cannot be used to analyze a qualitative dependent variable model.
The models are analyzed in a general framework of probability models.
ii. Ordinal variables:- these are variables that have categories that can be ranked.
Example: – Rank to indicate political orientation
Y = 1, radical
= 2, liberal
= 3, conservative
- Rank according to education attainment
Y = 1, primary education
= 2, secondary education
= 3, university education
iii. Nominal variables: These variables occur when there are multiple outcomes that cannot be
ordered.
Example:
Example: Occupation can be grouped as farming, fishing, carpentry etc.
Y = 1 farming
= 2 fishing Note that numbers are
assigned arbitrarily
= 3 carpentry
= 4 Livestock
iv. Count variables: These variables indicate the number of times some event has occurred.
Example:
Example: How many strikes have been occurred.
Now let us turn our attention to the four most commonly used approaches to estimating binary
response models (Type of binomial models).
1. Linear probability models
2. The logit model
3. The probit model
4. The tobit (censored regression) model.
The above model expresses the dichotomous Yi as a linear function of the explanatory variable
Xi. Such kinds of models are called linear probability models (LPM) since E(Y i/Xi) the
conditional expectation of Yi given Xi, can be interpreted as the conditional probability that the
event will occur given Xi; that is, Pr(Yi = 1/Xi). Thus, in the preceding case, E(Y i/Xi) gives the
probability of a family owing a house and whose income is the given amount Xi. The
justification of the name LPM can be seen as follows.
1 Pi
Total 1
Therefore, by the definition of mathematical expectation, we obtain
E(Yi) = 0 (1 – Pi) + 1(Pi) = Pi ……………………………………..(3)
Now, comparing (2) with (3), we can equate
E(Yi/Xi) = Yi = 0 + 1 Xi = Pi ……………………………………(4)
1. Heteroscedasticity
The variance of the disturbance terms depends on the X’s and is thus not constant. Let us see
this as follows. We have the following probability distributions for U.
Yi Ui Probability
0 - 0 1 X i 1 Pi
1 1 0 1 X i Pi
Now by definition Var (Ui) = E(Ui – E(Ui)]2 = E(Ui2) since E(Ui) = 0 by assumption
Therefore, using the preceding probability distribution of Ui, we obtain
Var(Ui) = E(Ui2) = (- 0 – 1 Xi)2 (1-Pi) + (1- 0 – 1 Xi)2 (Pi)
=(- 0 – 1 Xi)2(1- 0 – 1 Xi) + (1- 0 – 1 Xi)2 ( 0 + 1 Xi)
= ( 0 + 1 Xi) (1- 0 – 1 Xi)
or Var(Ui) = E(Yi/Xi) [1 – E(Yi/Xi) = Pi (1 – Pi)
This shows that the variance of Ui is heteroscedastic because it depends on the conditional
expectation of Y, which, of course, depends on the value taken by X. Thus the OLS estimator of
is inefficient and the standard errors are biased, resulting in incorrect test.
2. Non-normality of Ui
Although OLS does not require the disturbance (U’s) to be normally distributed, we assumed
them to be so distributed for the purpose of statistical inference, that is, hypothesis testing, etc.
But the assumption of normality for Ui is no longer tenable for the LPMs because like Yi, Ui
takes on only two values.
Ui = Yi- 0 – 1 Xi
Now when Yi = 1, Ui = 1 - 0 – 1 Xi
Introduction to Econometrics Page 166
and when Yi = 0, Ui = – 0 – 1 Xi
Obviously Ui cannot be assumed to be normally distributed. Recall that normality is not
required for the OLS estimates to be unbiased.
3. Non-Sensical Predictions
The LPM produces predicted values outside the normal range of probabilities (0, 1). It predicts
value of Y that are negative and greater than 1. This is the real problem with the OLS
estimation of the LPM.
4. Functional Form:
Since the model is linear, a unit increase in X results in a constant change of in the
probability of an event, holding all other variables constant. The increase is the same regardless
of the current value of X. In many applications, this is unrealistic. When the outcome is a
probability, it is often substantively reasonable that the effects of independent variables will
have diminishing returns as the predicted probability approaches 0 or 1.
Remark:
Remark: Because of the above mentioned problems the LPM model is not recommended for
empirical works.
We have seen that LPM has many problems, such as non-normality of U i, heteroscedasticity of
Ui, possibility of Yˆi lying outside the 0-1 range, and the generally lower R 2 values. But these
problems are surmountable. The fundamental problem with the LPM is that it is not logically a
very attractive model because it assumes that P i = E(Y = 1/X) increases linearly with X, that is,
the marginal or incremental effect of X remains constant throughout.
Example:
Example: The LPM estimated by OLS (on home ownership) is given as follows:
R2 = 0.8048
The above regression is interpreted as follows
- The intercept of –0.9457 gives the “probability” that a family with zero income will
own a house. Since this value is negative, and since probability cannot be negative, we
treat this value as zero.
- The slope value of 0.1021 means that for a unit change in income, on the average the
probability of owning a house increases by 0.1021 or about 10 percent. This is so
whether the income level is increased or not. This seems patently unrealistic. In reality
one would expect that Pi is non-linearly related to Xi.
Therefore, what we need is a (probability) model that has the following two features:
1. As Xi increases, Pi = E(Y = 1/X) increases but never steps outside the 0-1 interval.
2. The relationship between Pi and Xi is non-linear, that is, “ one which approaches zero at
slower and slower rates as Xi gets small and approaches one at slower and slower rates
as Xi gets very large”
Geometrically, the model we want would look something like fig 7.1 below.
1 CDF
X
- 0
The above S-shaped curve is very much similar with the cumulative distribution function
(CDF) of a random variable. (Note that the CDF of a random variable X is simply the
Therefore, one can easily use the CDF to model regressions where the response variable is
dichotomous, taking 0-1 values.
The CDFs commonly chosen to represent the 0-1 response models are.
a) the logistic – which gives rise to the logit model
b) the normal – which gives rise to the probit (or normit) model
Now let us see how one can estimate and interpret the logit model.
Pi
Now is simply the odds ratio in favor of owning a house- the ratio of the probability
1 Pi
that a family will own a house to the probability that it will not own a house.
Taking the natural log of the odds ratio we obtain
zero. Like most interpretations of intercepts, this interpretation may not have any physical
meaning.
1 0
since values of L are meaningless (ex. L = ln and L = ln .
0 1
Therefore estimation is by using the maximum likelihood method. (because of its mathematical
complexities we will not discuss the method here).
Example:
Example: Logit estimates. Assume that Y is linearly related to the variables Xi’s as follows:
Yi = 0 + 1 X1 + 2 X2 + 3 X3 + 4 X4 + 5 X5 + Ui
The logit estimate results are presented as:
Yi = -10.84 – 0.74X1 – 11.6X2 – 5.7X3 – 1.3X4 + 2.5X5
t = (-3.20) (-2.51) (-3.01) (-2.4) (-1.37) (1.62)
The variables X1, X2 and X3 are statistically significant at 99%. The variable X 4 is significant at
90%. The above estimated result shows that the variables X1, X2 and X3 have a negative effect
on the probability of an event to occur (i.e., y = 1). While the sign of 5 or the variable X5 has
a positive effect on the probability of an event to occur.
Note:
Note: Parameters of the model are not the same as the marginal effects we are used to when
analyzing OLS.
The estimating model that emerges from the normal CDF is popularly known as the probit
model.
Here the observed dependent variable Y, takes on one of the values 0 and 1 using the following
criteria.
Define a latent variable Y* such that Yi * = X i1 + I
Y = 1 if Yi * > 0
0 if Yi * 0
The latent variable Y* is continuous (-
(- < Y* < ). It generates the observed binary variable Y.
An observed variable, Y can be observed in two states:
i) if an event occurs it takes a value of 1
ii) if an event does not occur it takes a value of 0
The latent variable is assumed to be a linear function of the observed X’s through the structural
model.
Example:
Example:
Let Y measures whether one is employed or not. It is a binary variable taking values 0 and 1.
Y* - measures the willingness to participate in the labor market. This changes continuously and
is unobserved. If X is a wage rate, then as X increases the willingness to participate in the labor
market will increase. (Y* - the willingness to participate cannot be observed). The decision of
the individual will be changed (becomes zero) if the wage rate is below the critical point.
Since Y* is continuous the model avoids the problems inherent in the LPM model (i.e., the
problem of non-normality of the error term and heteroscedasticity)
However, since the latent dependent variable is unobserved the model cannot be estimated
using OLS. Maximum likelihood can be used instead.
Most often, the choice is between normal errors and logistic errors, resulting in the probit
(normit) and logit models, respectively. The coefficients derived from the maximum likelihood
(ML) function will be the coefficients for the probit model, if we assume a normal distribution.
If we assume that the appropriate distribution of the error term is a logistic distribution, the
But as Amemiya suggests, a logit estimate of a parameter multiplied by 0.625 gives a fairly
good approximation of the probit estimate of the same parameter. Similarly the coefficients of
LPM and logit models are related as follows:
LPM = 0.25 Logit , except for intercept
LPM = 0.25 Logit + 0.5 for intercept
Summary
- logit function
e ( X i ) 1
P(Y = 1/X) = X i
(we obtain this by dividing both the numerator and
1 e 1 e X i
xi
denominator by e
- Probit function
P(Y = 1/X) = (-
(- - Xi)
where (.) is the normal probability distribution function
2
1 1 X
(i.e., exp
2 2 2
Therefore, it is possible to avoid the problems of nonsensical result and the constancy impact of
X on the dependent variable (i.e. it will not be constant) since both models are non linear.
An extension of the probit model is the tobit model developed by James Tobin. To explain this
model, let us consider the home ownership example.
Suppose we want to find out the amount of money the consumer spends in buying a house in
relation to his or her income and other economic variables. Now we have a problem. If a
consumer does not purchase a house, obviously we have no data on housing expenditure for
such consumers; we have such data only on consumers who actually purchase a house.
Thus consumers are divided into two groups, one consisting of say, N 1 consumers about whom
we have information on the regressors (say income, interest rate etc)as well as the regresand
( amount of expenditure on housing) and another consisting of say, N 2 consumers about whom
we have information only on the regressors but on the regressand. A sample in which
information on regressand is available only for some observations is known as a censored
sample. Therefore, the tobit model is also known as a censored regression model.
7.7 SUMMARY
- Latent variable
- Similarity and differences between logit and probit models
- Both the logit and probit models guarantee that the estimated probabilities lie in the 0-1
range and that they are non linearly related to the explanatory variables.
- Interpretation of the logit and probit models are the same
- Estimation is by MLE
- The Tobit model is an extension of the probit and is mainly applied when we have
censored data.
- The tobit model
Yi * = Xi + i
Y = Yi * if Yi * > 0
0 if Yi * 0
7.8 ANSWERS TO CHECK YOUR PROGRESS QUESTIONS
Answers to check your progress questions in this unit are already discussed in the text.
CONTENT
8.0 Aims and Objective
8.1 Introduction
8.2 Stationarity and Unit Roots
8.3 Cointegration Analysis and Error Correction Mechanism
8.4 Summary
8.5 Answers to Check Your Progress
8.6 Model Examination
The aim of this unit is to extend the discussion of regression analysis by incorporating a brief
discussion of time series econometrics.
8.1 INTRODUCTION
Recall from our unit one discussion that one of the two important type of data used in empirical
analysis is time series data. Time series data have become so frequently and intensively used in
empirical research that econometricians have recently begun to pay very careful attention to
such data.
In this very brief discussion we first define the concept of stationary time series and then
develop tests to find out whether a time series is stationary. In this connection we introduce
some related concepts, such as unit roots. We then distinguish between trend stationary and
Any time series data can be thought of as being generated by a stochastic or random process. A
type of stochastic process that has received a great deal of attention by time series analysis is
the so-called stationary stochastic process.
Broadly speaking, a stochastic process is said to be stationary if its mean and variance are
constant over time and the value of covariance between two time periods depend only on the
distance or lag between the two time periods and not on the actual time at which the covariance
is computed. A non-stationary series on the other hand, do not have long run mean where the
variable returns and the variance extends to infinity as time goes by.
For many of time series data, however, stationarity is unlikely to exist. If this is the case, the
conventional hypothesis testing procedure based on t, F, Chi-square and other tests may be
suspected. In other words, if variables in the model are non-stationary, it results in spurious
regression. That is, the fact that the variables share common trend will tend to produce
significant relationship between the variables. Nonetheless, the relationship exhibits
contemporaneous correlation as a result of common trend rather than true causal relationship.
Hence, with non-stationary variables, conducting OLS generate misleading result.
Studies have developed different mechanism that enable non-stationary variables attain
stationalrity. It has been argued that if a variable has deterministic trend (i.e. if it can be
perfectly predictable rather than being variable or stochastic), including trend variable in the
regression removes the trend component and makes it stationary. For example in the regression
of consumption expenditure (PCE) an income (PDI) if we observe a very high r 2, which is
typically the case, it may reflect, not the true degree of association between the two variables,
but simply the common trend present in them. That is, with time the two variables move
together. To avoid such spurious association, the common practice is to regress PCE on PDI
and t(time), the trend variable. The coefficient of PDI obtained from this regression now
However, most time series data have a characteristic of stochastic trend (that is, the trend is
variable which therefore, cannot be predicted with certainty). In such cases, in order to avoid
the problem associated with spurious regression, pre-testing the variables for the existence of
unit roots (i.e., non stationarity) becomes compulsory. In general if a variable has stochastic
trend, it needs to be differenced in order to obtain stationarity. Such process is called difference
stationary process.
In this regard, the Dickey Fuller (DF) test enables us to assess the existence of stationarity. The
simplest DF test starts with the following first order autoregressive model.
Yt = Yt-1 + Ut ………………………………………..(8.1)
Subtracting Yt-1 from both sides gives
Yt -Yt-1 = Yt = Yt-1 - Yt-1 + Ut Yt = Yt-1 +
= (
(-1)Yt-1 + Ut
Yt = Yt-1 + Ut …………………………………………(8.2)
where Yt = Yt -Yt-1, = - 1
The test for stationarity is conducted on the parameter . If = 0 or (
( = 1) it implies that Yt =
Ut and hence the variable Y is not stationary (has unit root). In times series econometrics, a time
series that has a unit root is known as a random walk. This is because the change in Y ( Yt) is
purely a result of the error term, Ut. Thus, a random walk is an example of non-stationary time
series.
For the test of stationerity the hypothesis is formulated as follows:
H0: = 0 or (
( = 1)
H1: < 0 or (
( < 1)
Note that (8.2) is appropriate only when the series Y t has a zero mean and no trend term. But it
is impossible to know whether the true value of Yt has zero mean and no trend term. For this
reason including a constant (drift) and time trend in the regression is recommended. Thus (8.2)
is expanded to the following form.
Here as well the parameter is used while testing for stationerity. Rejecting the null hypothesis
(of H0: = 0) implies that there exists stationerity. That is Yt is also influenced by Yt-1 in
addition to Ut. Thus, the change in Yt (i.e., Yt) does not follow a random walk. Note that
accepting the null hypothesis is suggests the existence of unit root (or non stationarity)
The DF test has a series limitation in that it suffers from residual autocorrelation. Thus, it is
inappropriate to use DF distribution with the presence of a utocorrelated errors. To amend this
weakness, the DF model is augmented with additional lagged first difference of the dependent
variable. This is called Augmented Dicky Fuller (ADF). This regression model avoids
autocorrelation among the residuals. Incorporating lagged first difference of Y t in (8.3) gives
the following ADF model.
k
Yt = + T + Yt-1 + i Yt i U t ………........................….(8.4)
i 1
Example:
Example: Let us illustrate the ADF test using the Personal Consumption Expenditure (PCE)
data of Ethiopia suppose that regressions of PCE that corresponds to (8.4) gave following
results:
PCE = 233.08 + 1.64t – 0.06PCEt-1 + PCEt-1 …………………………(8.5)
For our purpose the important thing is (taw) statistic of PCEt-1 variable. This is a table that
helps to test the hypothesis stated earlier. Suppose the calculated value do not exceeds its
table value, in this case we fail to reject the null hypothesis which indicates the PCE time series
is not stationary. Thus, if it is not stationary, using the variable at levels will lead to spurious
regression result. As has been stated earlier, if a variable is not stationary at levels, we need to
conduct the test on the variable in its difference form. If a variable that is not stationary in
levels appears to be stationary after nth difference then the variable is said to be integrated
order of n, symbolically we write I(n). Suppose we repeat the preceding exercise using the first
difference of PCE (i.e., PCEt = PCEt – PCEt-1as explanatory variables). If the test result allows
Note that taking the variables in difference form presents only the dynamic interaction among
the variables with no information about the long run relationship. However, if the variables that
are non stationary separately have the same trend, it points that the variables have a stationary
linear combination. This in turn implies that the variables are cointegrated, i.e., there exists long
run equilibrium (relationship) among the variables.
1. Distinguish between trend stationary process (TSP) and a difference stationary process
(DSP)?
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
2. What is meant by stationarity and unit roots?
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
3. What is meant by integrated time series?
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
4. Discuss the concept of spurious regression
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
Cointegration among the variables reflects the presence of long run relationship in the system.
We need to test for cointegration because differencing the variables to attain stationarity
generates a model that does not show the long run behavior of the variables. Hence, testing for
cointegration is the same as testing for long run relationship.
There are two approaches used in testing for cointegration. They are i) Engle-Granger (two-step
algorism) and ii) Johansen Approach.
The Engle-Granger (EG) method requires that for cointegration to exist, all the variables mush
be integrated of the same order. Once the variables are found to have the same order of
integration, the next step is testing for cointegration. This needs to generate the residual from
the estimated static equation and test its stationarity. By doing so we are testing whether the
deviation from the long run (captured by the error term) from the long run are stationary or not.
If the residuals are found to be stationary, it implies that the variables are cointegrated. This in
turn ensures that the deviation from the long run equilibrium relationship dies out with time.
Example:
Example: Suppose we regress PCE on PDI to find out the following estimated relationship
between the two.
PCEt = 0 + 1PDI + Ut ………………………………………(8.6)
To identify whether PCE and PDI are cointegrated (i.e., have stationary linear combination) or
not we write (8.6) as follows
Ut = PCEt-1 - 0 - 1PDI ....……………………………………(8.7)
The purpose of (8.7) is to find that U t [i.e., the linear combination (PCE - 0 - 1PDI)] is I(0) or
stationary. Using the procedure stated in the earlier sub tunit for testing stationarity, if we reject
the null hypothesis then we say that the variables PCE and PDI are cointegrated.
In short, provided we check that the residuals are stationary, the traditional regression
methodology that we have learned so far (including t and F tests) is applicable to data involving
time series.
We just showed that PCE and PDI are cointegrated, that is there is a long-term equilibrium
relationship between the two. Of course, in the short run there may be disequilibrium.
Therefore, one can treat the error term in (8.7) as the “equilibrium error”. We can use this error
term to tie the short-run behavior of PCE to its long run value. In other words, the presence of
cointegration makes it possible to model the variables (that are in first difference) through the
error correction model (ECM). In the model a one time lagged value of the residual hold the
error correction term where its coefficient captures the speed of adjustment to the long run
equilibrium. The following model specification show with the PCE/PDI example how the ECM
works
ˆE
PC Uˆ
t = 0 + 1PDIt + 2 t 1 +
t ……................................(8.8)
ˆ
where U t 1 is the one period lagged value of the residual from regression (8.6) and t is the
error term with the usual properties.
In (8.8) PDI captures the short run disturbances in PDI whereas the error correction term
Uˆ
t 1 captures the adjustment toward the long-run equilibrium. If 2 is statistical significant
(and has to be negative between 0 and –1), it tells us what proportion of the disequilibrium in
PCE in one period is corrected in the next period.
Example:
Example: Suppose we obtain the following result
ˆE ˆ
PC 0.29PDIt – 0.08 U t 1 …………………………..(8.9)
t = 11.69 + 0.29
However, the use of Engle Granger method is criticized for its failure on some issues that are
addressed by the Johansen Approach. Interested readdress can get a detailed discussion of this
advanced approach on Harris (1995).
8.4 SUMMARY
In this very brief unit we discussed time series regression analysis. The explanation showed that
most economic time series are non-stationary. Stationarity can be checked by using the ADF
test. Regression of one time series variable on one or more time series variables often can give
spurious results. This phenomenon is known as spurious regression. One way to guard against
it is to find out if the time series are cointegrated. Cointegration of two (or more) time series
suggests that there is a long run or equilibrium relationship between them. The Engle-Granger
or Johansen approach can be used to find out if two or more time series are cointegrated. Note
also that the ECM is a means of reconciling the short run behavior of economic variable with
its long run behavior.
The answers for all questions are found in the discussion under sub units 8.2 and 8.3