Econometrics Module 2

UNIT 1: INTRODUCTION
Contents
1.0 Aims and Objectives
1.1 Definition of Econometrics
1.2 Goals of Econometrics
1.3 Division of Econometrics
1.4 Methodology of Econometrics
1.5 The Nature and Sources of Data for Econometrics Analysis
1.5.1 Types of Data
1.5.2 The Sources of Data
1.6 Summary
1.7 Answers to Check Your Progress
1.8 References
1.9 Model Examination Questions
1.0 AIMS AND OBJECTIVES
The purpose of this unit is to let you know what econometrics is all about; and to discuss the
scope, goals, division and methodology of econometric analysis.
After completing this unit, you will be able to:

 understand the definition of econometrics
 differentiate econometrics with other disciplines
 distinguish the three main goals of econometrics
 distinguish the two branches of econometrics
 understand the methodology of Econometrics
 know the types and sources of data.
Introduction to Econometrics Page 1

1.1 INTRODUCTION
Definition:
Definition: Econometrics deals with the measurement of economic relationships.
Econometrics is a combination of economic theory,
theory, mathematical economics and statistics,
statistics, but
it is completely distinct from each one of these three branches of science. The relationships and
differences among these sciences are pointed out below.
A. Economic theory makes statements or hypotheses that are mostly qualitative in nature
Ex.
Ex. Microeconomic theory states that, other things remaining the same, a reduction in the price
of a commodity is expected to increase the quantity demanded of that commodity. But the
theory itself does not provide any numerical measure of the relationship between the two: that
is it does not tell by how much the quantity will go up or down as a result of a certain change in
the price of the commodity. It is the job of econometrician to provide such numerical
statements.
B. The main concern of Mathematical economics is to express economic theory in

mathematical form (equations) without regard to measurability or empirical verification of the
theory. Both economic theory and mathematical economics state the same relationships.
Economic theory uses verbal exposition but mathematical economics employs mathematical
symbolism. Neither of them allows for random elements which might affect the relationship
and make it stochastic. Further more, they do not provide numerical values for the coefficients
of the relationships.
Although econometrics presupposes the expression of economic relationships in mathematical

form, like mathematical economics it does not assume that economic relationships are
exact(deterministic).
- It assumes that relationships are not exact
- Econometric methods are designed to take in to account random disturbances which create
deviations from the exact behavioral patterns suggested by economic theory and
mathematical economics.
- Econometrics provide numerical values of the coefficients of economic phenomena.
C. Economic Statistics is mainly concerned with collecting, processing, and presenting

economic data in the form of charts and tables. It is mainly a descriptive aspect of economics. It

does not provide explanations of the development of the various variables and it does not
provide measurement of the parameters of economic relationships.
The econometrician often needs special methods since the data are not generated as the result of
a controlled experiment. This creates special problems not normally dealt with in mathematical
statistics. Moreover, such data are likely to contain errors of measurement, and the
econometrician may be called up on to develop special methods of analysis to deal with such
errors of measurement.
To Conclude:
Conclude: Econometrics is an amalgam of economic theory, mathematical economics,
economic statistics, and mathematical statistics. Yet, it is a subject that deserves to be studied in
its own right for the above mentioned reasons.
1.2 GOALS OF ECONOMETRICS
Three main goals of econometrics

1. Analysis: - Testing Economic Theory
Economists formulated the basic principles of the functioning of the economic system using
verbal exposition and applying a deductive procedure. Economic theories thus developed in an
abstract level were not tested against economic reality. Econometrics aims primarily at the
verification of economic theories.
2. Policy-Making
In many cases we apply the various econometric techniques in order to obtain reliable estimates
of the individual coefficients of the economic relationships from which we may evaluate
elasticities or other parameters of economic theory (multipliers, technical coefficients of
production, marginal costs, marginal revenues, etc.) The knowledge of the numerical value of
these coefficients is very important for the decisions of firms as well as for the formulation of
the economic policy of the government. It helps to compare the effects of alternative policy
decisions.
3. Forecasting
In formulating policy decisions it is essential to be able to forecast the value of the economic
magnitudes. Such forecasts will enable the policy-maker to judge whether it is necessary to
take any measures in order to influence the relevant economic variables.

1.3 DIVISION OF ECONOMETRICS
Econometrics
Theoretical Applied
Classical Bayesian Classical Bayesian
Econometrics may be divided in to two broad categories

1. Theoretical Econometrics
2. Applied Econometrics
- Theoretical Econometrics is concerned with the development of appropriate methods for

measuring economic relationships specified by econometric models. In this aspect,
econometrics leans heavily on mathematical statistics. For example, one of the tools that is
used extensively is the method of least squares. It is the concern of theoretical econometrics
to spell out the assumptions of this method, its properties, and what happens to these
properties when one or more of the assumptions of the method are not fulfilled.
- In applied Econometrics we use the tools of theoretical econometrics to study some special
field(s) of economics, such as the production function, consumption function, investment
function, demand and supply functions, etc.
Applied econometrics includes the applications of econometric methods to specific branches of
economic theory. It involves the application of the tools of theoretical econometrics for the
analysis of economic phenomena and forecasting economic behavior.
Check Your Progress 1.1

Distinguish between
A. Mathematical statistics, economic theory and econometrics
B. Theoretical and applied econometrics
C. What are the goals of econometrics
1.4 METHODOLOGY OF ECONOMETRICS

In any econometric research we may distinguish four stages:
Specification of the model

The first, and the most important, step the econometrician has to take in attempting the study
of any relationship between variables, is to express this relationship in mathematical form,
that is to specify the model, with which the economic phenomenon will be explored
empirically. This is called the specification of the model or formulation of the maintained
hypothesis. It involves the determination of:
i) the dependent and explanatory variables which will be included in the model. The
econometrician should be able to make a list of the variables that might influence the
dependent variable.
. General economics theories,
. Previous studies in any particular field and
. Information about individual condition in a particular case, and the actual behavior
of the economic agents may indicate the general factors that affect the dependent
variable.
ii) the a priori theoretical expectations about the sign and the size of the parameters of
the function. These a priori definitions will be the theoretical criteria on the basis of
which the results of the estimation of the model will be evaluated
. Economic theory
. Other applied research
. Information about possible special features of the phenomena being studied will
contain suggestions about the sign and size of the parameters.
Example: Consider the following simple consumption function:
C = B0 + B1Y+ U
Where: C = Consumption function
Y = level of income
In this function the coefficient B1 is the marginal propensity to consume (MPC) and
should be positive with a value less than unity (0<B
(0<B1<1). The constant intercept, Bo of the
function is expected to be positive. This is because when income is zero, consumption will

assume a positive value; people will spend past savings, will borrow or find other means
for covering their needs.
iii) the mathematical form of the model (number of equations liner or non-linear form of
these equations, etc).
The specification of the econometric model will be based on economic theory and on any
available information relating to the phenomenon being studied. The econometrics must know
the general laws of economic theory, and further more he must gather any other information
relevant to the particular characteristics of the relationship as well as all studies already
published on the subject by other research workers.
The most common errors of specification are:
- the omission of some variables from the functions
- the omission of some equations
- the mistaken mathematical form of the functions.
Estimation of the Model

Having specified the econometric model, the next task of the econometrician is to obtain
estimates (numerical values) of the parameters of the model from the data available; consider
the Keynesian consumption function.
C = o + 1Y+ U
Where C is consumption
Y is income
If 1 = 0.8 this value provides a numerical estimates of the marginal propensity to consume
(MPC). If also supports Keynes’ hypothesis that MPC is less than 1.
The stage of estimation includes the following steps.
- Gathering of statistical observations (data) on the variables included in the
model
- Examination of the identification conditions of the function in which we are
interested.
- Examination of the aggregation problems involved in the variables of the
function.
- Examination of the degree of correlation between the explanatory variables.

- Choice of the appropriate econometric technique for the estimation of the
function and critical examination of the assumptions of the chosen technique
and of their economic implications for the estimates of the coefficients.
Evaluation of Estimates
After the estimation of the model the econometrician must proceed with the evaluation of the
results of the calculations that is with the determination of the reliability of these results. The
evaluation consists of deciding whether the estimates of the parameters are theoretically
meaningful and statistically satisfactory. Various criteria may be used.
- Economic a prior criteria:
criteria: – These are determined by the principles of economic
theory and refer to the sign and the size of the parameters of economic relationships. In
econometric jargon we say that economic theory imposes restrictions on the signs and
values of the parameters of economic relationships.
- Statistical criteria:
criteria: – These are determined by statistical theory and aim at the
evaluation of the statistical reliability of the estimates of the parameters of the model.
The most widely used statistical criteria are the correlation coefficient and the
standard deviation( or the standard error) of the estimates. These concepts will be
discussed in the subsequent units. Note that the statistical criteria are secondary only to
the a priori theoretical criteria. The estimates of the parameters should be rejected in
general if they happen to have the wrong sign or size even though the pass the
statistical criteria.
- Econometric criteria:
criteria: – are determined by econometric theory. It aims at the
investigation of whether the assumptions of the econometric method employed are
satisfied or not in any particular case. When the assumptions of an econometric
technique are not satisfied it is customary to re specify the model.
Evaluation of the forecasting power of the estimated model

The final stage of any econometric research is concerned with the evaluation of the forecasting
validity of the model. Estimates are useful because they help in decision-making. A model,
after the estimation of its parameters, can be used in forecasting the values of economic
variables. The econometrician must ascertain how good the forecasts are expected to be in other
words he must test the forecasting power of the model.

It is conceivably possible that the model is economically meaningful and statistically and
econometrically correct for the sample period for which the model has been estimated, yet it
may vary well not be suitable for forecasting due, for example, to rapid change in the structural
parameters of the relationship in the real world.
Therefore, the final stage of any applied econometric research is the investigation of the
stability of the estimates, their sensitivity to changes in the size of the sample.
One way of establishing the forecasting power of a model is to use the estimates of the model
for a period not included in the sample. The estimated value (forecast value) is compared with
the actual (realized) magnitude of the relevant dependent variable. Usually there will be a
difference between the actual and the forecast value of the variable, which is tested with the
aim of establishing whether it is (statistically) significant. If after conducting the relevant test of
significance, we find that the difference between the realized value of the dependent variable
and that estimated from the model is statistically significant, we conclude that the forecasting
power of the model, its extra – sample performance, is poor.
Another way of establishing the stability of the estimates and the performance of the model
outside the sample of data from which it has been estimated, is to re-estimate the function with
an expanded sample, that is a sample including additional observations. The original estimates
will normally differ from the new estimates. The difference is tested for statistical significance
with appropriate methods.
Reasons for a model’s poor forecasting performance

a) The values of the explanatory variables used in the forecast may not be accurate
b) The estimates of the coefficients   ' s  may be poor, due to deficiencies of the sample
data.
c) The estimates are ‘good’ for the period of the sample, but the structural background
conditions of the model may have changed from the period that was used as the basis for
the estimation of the model, and there fore the old estimates are not ‘good’ for
forecasting. The whole model needs re-estimation before it can be used for prediction.
Example . Suppose that we estimate the demand function for a given commodity with a single
equation model using time-series data for the period 1950 – 68 as follows

Q̂t = 100 + 5Yt – 30Pt
This equation is then used for ‘forecasting’ the demand of the commodity in the year 1970, a
period outside the sample data.
Given Y1970 = 1000 and P1970 = 5
Q̂t = 100 + 5(1000) – 30(5) = 4, 950 units.
If the actual demand for this commodity in 1970 is 4, 500 there is a difference of 450 between
the estimated from the model and the actual market demand for the product. The difference can
be tested for significance by various methods. If it is found significant, we try to find out what
are the sources of the error in the forecast, in order to improve the forecasting power of our
model.
Check Your Progress 1.2.

1. What types of criteria would you use to evaluate the results of an estimated relationship?
……………………………………………………………………………………………
……………………………………………………………………………………………
2. Distinguish between statistical and econometric criteria.
……………………………………………………………………………………………
……………………………………………………………………………………………
3. What are the most commonly used statistical criteria?
……………………………………………………………………………………………
……………………………………………………………………………………………
1.5 THE NATURE AND SOURCES OF DATA FOR ECONOMETRIC ANALYSIS
The success of any econometric analysis ultimately depends on the availability of the
appropriate data. Let us first discuss the types of data and then we will see the sources and
limitations of the data.
1.5.1 Types of Data

There are three types of data
a) Time series data
This is a set of observations on the values that a variable takes at different times. Such data may
be collected at regular time intervals: daily, weekly, monthly, quarterly, annually etc.
Example.
Example. data on stock prices, unemployment rate, GDP etc
Data may be qualitative or quantitative
Qualitative data are sometimes called dummy variables or categorical variable. These are
variables that cannot be quantified.
Example: male or female, married or unmarried, religion, etc
Quantitative data are data that can be quantified
Example.
Example. income, prices, money etc.
b) Cross-Section data
These data give information on the variables concerning individual agents (consumers or
producers) at a given point of time.
Example:
Example:
- the census of population conducted by CSA.
-survey of consumer expenditure conducted by Addis Ababa university
Note that due to heterogeneity, cross- sectional data have their own problems.
c) Pooled Data
These are repeated surveys of a single (cross-section) sample in different periods of time. They
record the behavior of the same set of individual microeconomic units over time. There are
elements of both time series and cross sectional data.
The panel or longitudinal data also called micro panel data, is a special type of pooled data in
which the same cross-sectional unit is surveyed over time.
1.5.2 The Sources of Data
A governmental agency, an international agency, a private organization or an individual may

collect the data used in empirical analysis.
Example.
Example. Governmental in Ethiopia: - MEDAC, MOF, CSA, NBE

International agencies: - International Monetary Fund (IMF), World Bank (WB)
The individual (researcher) himself may collect data through interviews or using questionnaire.
In the social sciences the data that one generally obtains is non experimental in nature; that is
not subject to the control of the researcher. For example, data on GNP, unemployment, stock
prices etc are not directly under the control of the investigator. This often creates special
problems for the researcher in pinning down the exact cause or causes affecting a particular
situation.
Limitations
Although there is plenty of data available for economic research, the quality of the data is often
not that good. Reasons are:
- Since most social science data are not experimental in nature, there is the possibility of
observational errors.
- Errors of measurement arising from approximations and round offs.
- In questionnaire type surveys, there is the problem of non-response
- Respondents may not answer all the questions correctly
- Sampling methods used in obtaining data
- Economic data is generally available at a highly aggregate level. For example most macro
data like GNP, unemployment, inflation etc are available for the economy as a whole.
- Because of confidentiality, certain data can be published only in highly aggregate form For
example, data on individual tax, production, employment etc at firm level are usually
available in aggregate form.
Because of all these and many other problems, the researcher should always keep in mind that
the results of research are only as good as the quality of the data. Therefore, the results of the
research may be unsatisfactory due to the poor quality of the available data (may not be due to
wrong model)

1. Give examples of a time series and cross section data (do not write those mentioned in
the text)
……………………………………………………………………………………………
……………………………………………………………………………………………
2. Distinguish between qualitative and quantitative data
……………………………………………………………………………………………
……………………………………………………………………………………………
1.6. SUMMARY
Definition of Econometrics
Economic theory, mathematical economics and statistics
Econometrics is an amalgam of economic theory, mathematical economics, economic

statistics, and mathematical statistics.
Three main goals of econometrics
i) Analysis: - Testing Economic Theory
ii) Policy-Making
iii) Forecasting
Division of econometrics:
. Theoretical Econometrics
. Applied Econometrics
Methodology of econometrics:
In any econometric research we may distinguish four stages:
A) Specification of the model; which involves the determination of:

i) the dependent and explanatory variables .
ii) the a priori theoretical expectations about the sign and the size of the parameters.
iii) the mathematical form of the model.

The most common errors of specification are:
B) Estimation of the Model
The stage of estimation
C) Evaluation of Estimates
Criteria for evaluation of the estimates
- Economic a prior criteria:
criteria: – These are determined by the principles of economic
theory and refer to the sign and the size of the parameters of economic relationships.
- Statistical criteria:
criteria: – These are determined by statistical theory
correlation coefficient and the standard deviation( or the standard error) of the
estimates.
- Econometric criteria:
criteria: – are determined by econometric theory.
D) Evaluation of the forecasting power of the estimated model

-Reasons for a model’s poor forecasting performance
Types of Data
There are three types of data
A)Time series data
qualitative or quantitative data
dummy variables or categorical variable.
B)Cross-Section data
C)Pooled Data
. The panel data
The Sources of Data
1.7. ANSWERS TO CHECK YOUR PROGRESS

Answers are explicitly discussed in the text
Check your progress 1.2


Check your progress 1.3

1.8 REFERENCES
Gujarati, D., Basic Econometrics.

Kmenta, J., Elements of Econometrics, Macmillan, New York, 1971
Koutsoyiannis, A., Theory of Econometrics, 2nd ed. Pal grave, 1977.
1.9 MODEL EXAMINATION QUESTIONS
1. Economic theory postulates exact relationships between economic variables. Consider

the demand function
Qd =  o   1 P   2 Y 0
Where Qd = Quantity demanded

P = price
Y = income
a) What is the meaning of exact relationship?
b) What is the economic meaning of the coefficients (b0, b1, & b2)
c) What do you expect the sign of the coefficients?
2. Distinguish between stochastic and deterministic relationships.
3. The results of research are only as good as the quality of the data. Explain it.
4. Mention some of the reasons for the poor forecasting power of the estimated model.

UNIT 2: THE TWO VARIABLE LINEAR REGRESSION MODEL
(SIMPLE LINEAR REGRESSION MODEL)
Content
2.1 The Concept of Regression Analysis
2.2 Population Regression Function Vs Sample Regression Function
2.3 The Method of Ordinary Least Squares
2.4 Statistical Test of Significance and Goodness of Fit
2.5 Confidence Interval and Prediction
2.6 Summary
2.8 Model Examination
2.9 References
This unit introduces the key idea behind regression analysis. The objective of such analysis is
to estimate and/or predict the mean or average value of the dependent variable on the basis of
the known or fixed values of the explanatory variables.
After completing this unit the student will, among others,
 be able to apply ordinary least squares method in a two variable regression analysis and
interpret the results.
 conduct a measure of goodness of fit of regression estimates.
 construct hypothesis testing procedure for regression coefficients
 apply the regression result to forecasting (prediction)
2.1 THE CONCEPT OF REGRESSION ANALYSIS
Regression analysis is concerned with the study of the dependence of one variable, the
dependent variable, on one or more other variables, the explanatory variable with the view to

estimating/ and or predicting the (population mean or average value of the terms of the known
or fixed values of the latter.
A frequent objective in research is the specification of a functional relationship between two
variables such as
Y = f(x) ..........................................................(2.1)
..........................................................(2.1)
where Y is called dependent variable and X the independent (or explanatory) variable.
The purely mathematical model stated above is of limited interest to the econometrician for it
assumes that there is an exact or deterministic relationship between Y and the explanatory
variable X. But relationships between economic variables are generally inexact. That is,
economic model is simply a logical representation of theoretical knowledge (or an priori
knowledge). On the other land econometric model simply represents a set of behavioural
equations derived from and economic model. It differs from economic model since the
relationship between the variables in stochastic (i.e. non exact). In other words we cannot
expect a perfect explanation and hence we write
Y = f(x) + U
= 0 + 1X + U ...........................................(2.2)
where U is a random variable called residual or error, 0 is the constant term and 1 is the slope
parameter This (2.2) is called a regression equation of Y on X. Notice that since U is a random
variable, Y is also a random variable.
Suppose that consumptions is a function of income and write C= f(Y)or mathematically

C = 0 + 1Y. An econometrician will claim that this relationship must also include a
disturbance (or error) term and may alter the above equation to C = 0 + 1Y + U. Note that
with out the disturbance term (U), the relationship is said to be exact or deterministic; with the
disturbance term, it is said to be stochastic or in exact.
Basically, the existence of the disturbance term is justified in three main ways.
i) Omission of other variables: although income might be the major determinant of the
level of consumption, it is not the only determinant. Other variables such as interest
rate, or liquid asset holdings may have a systematic influence on consumption. Their
omission constitutes one type of specification error. But, the disturbance term is often

viewed as representing the net influence of small independent causes such as taste
charge, epidemics, and others.
ii) Measurement error: it may be the case that the variable being explained cannot be
measures accurately, either because of data collection difficulties or because it is
inherently un measurable and a proxy variables must be used instead. The disturbance
term can in these circumstances be though of as representing this measurement error
[(of the variable(s)]
Example: measuring taste is not an easy job.
iii) Randomness in human behavior. Humans are not machines that will do as instructed.
So there is unpredictable element. Example: due to unexplained case, an increase in
income may not influence consumption. Thus the disturbance term captures such human
behavior that is left unexplained by the economic model.
iv) Imperfect specification of the model. Example: we may have linearized a non-linear
function if so, the random term may tell us the wrong specification
Generally speaking regression analysis is concerned with the study of the dependency of one
dependent variable on one or more other variables called the explanatory variable(s) or the
independent variable(s). Moreover, the true relationship that connects the variables involved is
split in to two. They are systematic (or explained variation and random or (unexplained)
variation. Using (2.2) we can disaggregate the two components as follows
Y = 0 + 1X + U
That is,
[variation in Y] = [systematic variation] + [random variation]
In our analysis we will assume that the “independent” variable X is nonrandom. We will also
assume a linear model. Note that this course is concerned with linear model like (2.3). In this
regard it is essential to know what the term linear really means, for it can be interpreted in two
different ways. These are,
a) Linearity in the variables
b) Linearity in parameters

i) Linearity in variables implies that an equation is linear model if it is expressed in a straight
line.
Example.
Example. Consider the regression function Y = 0 + 1X. This means the slope (or
derivative) of this equation is independent of X so that there is linearity in variable. But if Y
= 0 + 1X2 then the variable X is raised (power) to second degree, so it is non-linear in
variable. This is because, the slope or derivative is not independent of the value taken by X.
That is, dy = 2
 X Hence the above function is not linear in X since the variable X
dx 2 1
appears with a power of 2
ii) Linearity in the parameter: this implies that the parameters (i.e., ) are raised to their first
degree. In this interpretation Y = 0 + 1X2 is a linear regression model but Y = 0 + 21X is
not. The later is an example of a non linear (in the parameters) regression model of the two
interpretation of linearity, linearity in the parameters is relevant for the development of the
regression theory. Thus the term linear regression means a regression that is linear in the
parameters, the ’s; it may or may not be linear in the explanatory variables.
The following discussion stress that regression analysis is largely concerned with estimating
and/or predicting the (population) mean or average value of the dependent variable on the basis
of the known or fixed values of the explanatory variable(s).
2.2 POPULATION REGRESSION FUNCTION VS SAMPLE REGRESSION FUNCTION
Imagine a hypothetical country with a total population of 20 families. Suppose we are

interested in studying the relationship between monthly family consumption expenditure, Y and
monetary disposable family income, X. To this end, suppose we divide these 20 families into 5
groups of approximately the same income and examine the consumption expenditure of
families in each of these income groups. The following table summarizes the case.
X 100 140 180 220 260

Y
65 75 95 105 115
70 80 100 110 120
75 85 105 115 125
80 90 110 120 130
Table 2.1 Family expenditure and income

Note that from the above table we can construct conditional probabilities. For instance P(Y =
65
/X = 160) = ¼ Or P(Y = 150/X = 120 = 1/4. The following figure clearly shows that
consumption expenditure(y) on the average increases as income (X) increases. That is
conditional mean value of Y increases as X increases.
Note from the above table that the average consumption expenditure when income is 100 is
72.5 (= 65 + 70 + 75 + 80). In other words, E(Y/X=100) = 72.5 This conditional mean
increases as X increases
160 PRF
140
120
100
80
60
40
20 100 140 180 220 260
Figure2.1 Conditional distribution of expenditure for various levels of income
Figure 2.1(the above line) is known as the population regression line or, more generally, the
population regression curve. Geometrically, a population regression curve is simply the locus of
the conditional means or expectations of the dependent variable for the fixed value of the
explanatory variables.
From the preceding discussion it is clear that each conditional mean which is E  Y X  is a
 
 i
function of Xi. Symbolically
E  Y X  = f(Xi) = 0 + 1Xi
 
....................................... (2.3)
 i
Note that in real situations we do not have the entire population available for examination. Thus
functional form that f(x) assumes is an important question. This is an empirical question
although in specific cases theory may have something to say.

For example an economist might posit that consumption expenditure is linearly related to
income. Hence,
E  Y X  = 0 + 1Xi
 
 i
Where 0 and 1 are unknown but fixed parameters known as the regression coefficients
(intercept and slope coefficients respectively). The above equation is known as the linear
population regression function. But since consumption expenditure does not necessarily
increase as income level increases we incorporate the error term. That is,
Yi = E  Y X  +Ui
 
 i
= 1 + 2Xi + Ui .......................................................(2.4)
Note that in table 2.1we observe that for the same value of X (e.g. 100) we have different value
of Y (65, 70, 75 and 80). Thus the value of Y is also affected by other factors that can be
captured by the error term, U.
If we take the expected value of (2.4) we obtain,
 Yi  = E  Y  + E  U i  .................................(2.5)
E   X   X 
 Xi   i  i
Since
 Yi  = E  Y 
E   X 
 Xi   i
it implies that
U i  = 0
E ...............................(2.6)

 Xi 
Thus, the assumption that the regression line passes through the conditional means of Y implies
that the conditional mean values of Ui (conditional upon the given X's) are zero.
In most practical situations, however, what we have is a sample of Y values corresponding to

some fixed X's. Therefore, our main task must be to estimate the population regression function
on the basis of the sample information.
The regression function based on a sample collected from the population is called sample
regression function (SRF).

SRF
Consumption expenditure
Income
Figure 2.2 Sample regression function
Hence, analogous to PRF that underlines the population regression line, we can develop the
concept of the sample regression function (SRF) to represent the sample regression line. The
sample regression function (which is a counterpart of the PRF stated earlier) may be written as:
Yî = ̂ 1 + ˆ 2 Xi + Û i ...............................(2.7)
where Yˆ ( is read as “Y – hat” or “Y – cap”) = estimator of E  Y X  , ̂ 1 = estimator of 1 ,

 
 i
ˆ 2 = estimator of 2 , Û i = an estimator of Ui
To sum up, because our analysis is based on a single sample from some population our primary
objective in regression analysis is to estimate the PRF given by
Yi = 1 + 2Xi + Ui
on the basis of
Yî = ̂ 1 + ˆ 2 Xi + Û i
We employ SRF because in most of the cases our analysis is based upon a single sample from
some population. But because of sampling fluctuations our estimate of the PRF based on the
SRF is at best an approximate one.
Sample and population regression lines

Note that Yˆ1 overestimates the true E  Y X  for Xi shown therein. By the same token, for any
 
 i
Xi to the left of the point A, the SRF will underestimate the true PRF. Such over and under
estimation is inevitable because of sampling fluctuations.
Note that there are several methods of constructing the SRF, but in so far regression analysis is
concerned the model that is used most extensively is the method of Ordinary Least Squares,
OLS
Check Your Progress 1

1. Define the concept of regression
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
2. Differentiate between PRF and SRF

________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
3. Differentiate between exact and inexact relationship and explain the reason for
incorporating the disturbance term in econometric modeling
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
4. What do we mean by conditional expectation, e.g. E(Y/ Xi)
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
2.3 THE METHOD OF ORDINARY LEAST SQUARES

2.3.1 The Concept of OLS
The critical question now is while SRF is an approximation of the PRF, can we devise a rule or
a method that will make this approximation as “close” as possible.
In other words, how should the SRF be constructed so that ̂ 1 is as “close” as possible to the
true 1 and ˆ 2 is as “close” as possible to the true 2 even though we never know the true 1
and 2. We can develop procedures that tell us how to construct the SRF to mirror the PRF as
faithfully as possible. This can be done even though we never actually determine the PRF itself.
The method of ordinary least squares has some very attractive statistical properties that have
made it one of the most powerful and popular method of regression analysis.
Recall the two-variable (Y and X) PRF

Yi = 1 + 2X + Ui
This shows that the true relationship which connects the variables involved is split into two
parts. A part represented by a line and a part represented by the random term U.
Thus 1 + 2X represent systematic explain variation and Ui refer to unexplained variation.
However, the PRF is not directly observable. Hence, we estimate it from the SRF. That is,
Yi = ˆ1  ˆ 2 X i  Uˆ i
= Yî  Uˆ i
Where Yî is the estimated (conditional mean) value of Yi
Note that Û i = Yi – Ŷi
= Yi – ˆ1  ˆ 2 X i
This shows that Û i , (the residuals) are simply the difference between the actual and the
estimated Y values.
The diagram below reveals this relationship.
Note from the above figure that Û i represents the difference between the Y and Yˆ , (SRF)

Now given n pairs of observations on Y and X, we would like to determine the SRF in such a
manner that it is as close as possible to the actual Y. To this end, we may adopt the following
criterion:
Choose the SRF in such a way that the sum of the residuals  Û i = (Yi – Ŷi ) is as small as
possible. However, Ui is zero (refer 2.6) although the U i are widely scattered about the SRF.
We can avoid this problem if we consider the sum of the squared errors, which is the least
squares criterion.
That is, minimize
 Uˆ i2 = (Yi – Yî )2
=  (Yi – ˆ 0  ˆ1 X i )2 ..................................(2.8)

Thus, the least square method requires the sum of the squared error term to be as small as
possible. Note that summing the error term (

( Û i ) can be small (even zero) even though the
Û i are widely spread about the SRF. But this is not possible under the least-square procedure,
for the larger the Û i (in absolute value), the larger the  Uˆ i2 .
In other words, the least-square method allows to choose ˆ 0 and ̂ 1 as estimator of 0 and 1
respectively so that
(Yi - ̂ 1 - ˆ 2 Xi)2
is minimum.
If the deviation of the actual from the estimate is the minimum, then our estimation from the
collected samples provides a very good approximation of the true relationship between the
variables
Note that to estimate the coefficients 0 and 1 we need observations on X, Y and U. Yet U is
never observed like the other explanatory variables, and therefore in order to estimate the
function Xi = 0 + 1Xi + Ui, we should guess the value of Ui, That means we should make
some reasonable (plausible) assumptions about the shape of the distribution of each U i (i.e., its
mean, variance and covariance with other U’s). These assumptions are guesses about the true,
but unobservable values of Ui

Note that in PRF: Yi = 0 + 1Xi + Ui. It shows that Yi depends on both Xi and Ui. Therefore,
unless we are specific about how X i and Ui are created or generated, there is no way we can
make any statistical inference about Yi and also, as we shall see about 0 and 1.
Thus the linear regression model is based on certain assumptions, some of which refer to the
distribution of the random variable Ui, some to the relationship between U i and the explanatory
variables, and finally some refer to the relationship between the explanatory variables
themselves. The following are assumption underlying the method of least squares.
Assumption 1:
1: Linear regression model: - the regression model is linear in the parameters.
Assumption 2:
2: X (explanatory) values are fixed in repeated sampling. Values taken by the
regressor X are considered fixed in repeated samples. More technically, X is assumed to be non
stochastic. In other wordsThat is our regression analysis is conditional regression analysis, that
is conditional on the given values of the regressor (s) X.
E.g. Recall that for a fixed value of 100 we have Y values of 65, 70, 75 and 80. Hence X
is assumed to be non stochastic.
Assumption 3:
3: Ui is a random real variable. The value which U may assume in any one period
depends on chance. It may be positive, negative or zero. Each value has a certain probability of
being assumed by U in any particular instance.
Assumption 4:
4: Zero mean of the disturbance term. This means that for each value of X, U may
assume various values, some greater than zero and some smaller than zero, but if we consider
all the possible values of U, for any given value of X, they would have an average value equal
to zero. Hence the mean or expected value of the random disturbance term U i is zero.
Symbolically, we have:
E
U i  = 0

 Xi 
That meanse the mean value of Ui
conditional upo the given Xi
zero. Example from table 2.1 we can show that when X = 100 the value of U i are -7.5., -2.5,
2.5 and 7.5 so far its average is zero.

Assumption 5:
5: The variance of Ui is constant in each period. This is the assumption of
Homodcedasticity. Homodcedadticity implies equal spread. That the conditional variance of U i
are identical. This means the variance of Ui about its mean is constant at al values of X. in other
words for all values of X the U’s will show the same dispersion around their mean.
Symbolically we have
2
U  U  
Var  i  = E U i  E i
 X i  X i 
= E
U i 

 Xi 
= 2 ..........................................(2.9)
Recall that this holds because of assumption 4
Equation (2.9) states that the variance of Ui.for each Xi is some positive constant number equal
to 2, equal variance. This means that the Y population corresponding to various X values have
the same variance. Consider the following figures.
Y
Y
Fig a Fig b
X
X X1 X2 X3
X1 X2 X3
(a) (b)
Figure 2.5 Variance of the error term for each Xi
Note that in both cases the distribution of the error term is normal. That is the value of U (for
each Xi) have a bell-shaped systematical distribution about their zero mean.
But in fig(a) there is equal variance of the error term (and hence y values) in each period (i.e.,
at all values of X). However, in figure(b) there is unequal spread or variance of the error term.
This situation is known as hetrodcedasticity.

Symbolically it implies that:
U i  =  2
Var  
 Xi 
i
Where the subscript i on 2 indicates that the variance of the Y population is no longer constant.
To understand the rational behind this assumptions, refer figure (b) where var
U 
 X  < Var  U X  .
 
 1  3
Therefore, the likelihood is that the Y observations coming from the population with X = X1
would be closer to the PRF then those coming from population corresponding to X = X 3. In
short all Y values corresponding to the various X’s will not be equally reliable, reliability being
judged by how closely or distantly the Y values are distributed around their means.
Stated differently this assumption is saying that all Y values corresponding to the various X’s
are equally important since they have the same variance. Thus assumption 5 implies that the
conditional variance of Yi, are also homoscedastic. That is,
Var (Yi /Xi) = 2
Notice from the above two assumptions that the variable Ui has a normal distribution. That is,
U i ~ N (0, 2). This means the random term Ui is with zero mean and constant variance, 2
Assumption 6: No autocorrelation between the disturbances. Given any two X values, X i and
Xj (i  j), the correlation between any two U i and Uj (i  j) is zero. This implies that the error
term committed for the ith observation is independent of the error term committed for the j th
observation. Such cases are also known as no serial correlation.
Symbolically,
cov [Ui, Uj/Xi, Xj] = E[Ui – E(Ui)/Xi] [Uj – E(Uj)/Xj]
= E(Ui/Xi) (Ui/Xj)
=0
Note that if i = j then we are dealing with assumption five. This is because E(U i/Xi) (Ui/Xj) =
E(U2) = 2
No autocorrelation implies that given X, the deviation of any two Y values form their means
do not exhibit a systematic pattern such as the one shown in the following figure.

Figure: 2.6: patterns of correlation among the disturbances
Figure (a) and (b) implies that because Ui dependent on Uj it means that Yt = 0 + 1Xt + Ut
depends not only on Xt but also on Ut-1. This is because Ut-1 to some extent determines Ut. Note
that figure (c) shows that there is no systematic pattern to the U’s, thus indicating zero
correlation.
Assumption 7:
7: Zero covariance between Ui and Xi or E(UiXi) = 0. That is, the error term is
independent of the explanatory variable(s). If the two are uncorrelated it means that X and U
have separate influence on Y. But if X and U are correlated it is not possible to assess their
individual effects on Y. But since we have assumed that X values are fixed (or non random) in
repeated samples, there is no way for it to co-vary with the error term. Thus, assumption 7 is
not very crucial.
Assumption 8:
8: the regression model is correctly specified
This assumption implies that there is no specification bias or error in the model used in
empirical analysis. This means that we have included all the important regressions explicitly in
the model and that its mathematical form is correct.

Unfortunately in practice one rarely knows the correct variables to include in the model or the
correct functional forms of the model or the correct probabilistic assumptions about the
variables entering the model for the theory underlying the particular investigation not be strong
or result enough to answer all these questions. Therefore, in practice, the econometrician has to
use some judgment in choosing the number of variables entering the model and the functional
form of the model. To some extent there is some trial and error involved in choosing the “right”
model building is more often an art rather than a science.
Assumption 9:
9: There is no perfect multicollinearity
That is, there are no perfect linear relationship among the explanatory variables. This
assumption words in case of multiple linear regression. That is, if there is more than one
explanatory variable in the relationship it is assumed that they are not perfectly correlated with
each other. Indeed the repressors should not even be strongly correlated, they should not be
highly multicolinear.
Assumption 10: The number of observations, n must be greater than the number of parameters
to be estimated. Alternatively the number of observations must be greater than the number of
At this point one may ask ‘how realistic these assumptions are really’? Note that in any
scientific study we make certain assumptions because they facilitate the development of the
subject matter in a gradual step, not because they are necessarily realistic in the sense that they
replicate reality exactly. What we plan to do is first study the properties of Classical Linear
Regression Model thoroughly and then in unit four we examine what happens if one or more of
the assumptions are not fulfilled.
Note that OLS method demands that the deviation of the actual from the estimated Y-value (i.e.,
Yi - Yî ) should be as small as possible. This method provides us with unique estimates of 1
and 2 that give the smallest possible value of  Uˆ i

2
. This is accomplished using differential
calculus.
That is, the sum of squared residual deviations is to be minimized with respect to ˆ 0 and ̂ 1 .
Thus, using partial derivatives we minimize  Uˆ i

2
and set it equal to zero (Recall that the

necessary condition on minimization, or maximization process is that the first derivative is set
to zero). Hence,

  Uˆ i2
0
 .....................................(2.10)
ˆ 0
and

  Uˆ i2
0
 ..................................... (2.11)
ˆ 1
Recall from (2.8) the formula of  Uˆ i

2
. In this regard the partial differentiation of (2.8) with
respect to ˆ 0 will be

  Uˆ i2  
=  2 Yi  ˆ 0  ˆ1 X i 0  .................... (2.12)
ˆ
 0
In the same way the partial differentiation of (2.8) with respect to ̂ 1 will be

  Uˆ i2   
=  2 Xi Yi  ˆ0  ˆ1 X i 0 ........................ (2.13)
ˆ 1
Simplifying (2.12) and (2.13) we generate the following normal equations.

Yi = nˆ0  ˆ1  X i ......................................... (2.14)
XiYi = ˆ 0  X i  ˆ1  X i
2
......................................... (2.15)
Solving for ̂ 1 from (2.15)we obtain:
ˆ1 
 X iYi  ˆ0  X i .......................................... (2.16)
 Xi2
Substituting (2.16) in place of ̂ 1 of (2.14) we get,
 X i2  Yi   X i Yi
ˆ
0  .......................................... (2.17)
2
n X i2    X i 
Alternatively solving for ̂ 0 from (2.14) gives
ˆ0 
 Yi  1  X i
n

= Y  ˆ1  X ....................................... (2.18)
n n
= Y - ̂ 1 X .......................................
(2.19)
where Y = mean of Y, X = mean of X

Substituting (2.18) in place of ̂ 0 of (2.16) we get,
n X i Yi   X i  Yi
ˆ1  2 ......................................... (2.20)
n  X i2    X i 
Therefore, equations (2.17) or (2.19) and (2.20) are the least square estimates since they are
obtained using the least square criteria.
Note that (2.20)is expressed in terms of the original sample observations on X and Y. It can be
shown that the estimate ̂ 1 may be obtained by the following formulae, which is expressed in
deviations of the variables from its mean: That is:
̂ 1 =
x yi i
.............................................
2
x i
(2.21)
where xi = Xi – X and, yi = Yi - Y
In other words
  X  X Y  Y 
i
̂ 1 = 2 ............................................
 X  X 
(2.22)
Proof
x y i i
=
  X  X Y  Y 
2
x i
2
X  X 
=
  XY  XY  Y X  XY 
X 2
 2 XX  X 2 
=
 XY  X  Y  Y  X  nXY
  X  2 XX  X 
2 2

X Y
Note that, X   and Y  
n n
X Y  X Y
 XY  n
Y   X n  n n n
Hence,
2  X X n X  X
X  2
n
 n n
2 X  Y  X Y
 XY  n

n
=
2
2 X  X  X
X
X 
n

n
n X i Yi   X Y i
= 2
  X 
2
n X i i
which is equal to (2.20)

Note that in some cases economic theory postulates relationships that have a zero constant
intercept, so that it passes through the origin of XY plane. For example linear production
functions of manufactured products should normally have zero intercept, since output is zero
when the factor inputs are zero.
In this event we should estimate the function Y = 0 + 1X + U by imposing the restriction, 0
= 0.
This is a restricted minimization problem: we minimize
 Uˆ i
2

 Y  ˆ 0  ˆ1 X  2
Subject to ̂ 0 = 0 .......................................... (2.23)

One way of solving this problem is to use the lagrangean function. Recall the concept of
constrained optimization process from Calculus for Economics Course.
Note that estimation of elasticities is possible from an estimated regression line. Recall that in
SRF Yî  ˆ0  ˆ1 X i is the equation of a line whose intercept is ˆ 0 and its slope ̂ 1 . The
coefficient ̂ 1 is the derivative of Ŷ with respect to X (i.e. i.e., dY dX 

This implies that for a linear function, the coefficient ̂ 1 is a component of the elasticity. This
is because it is defined by the formula

dY
P  Y  dY X ............................................... (2.24)
dX dX Y
X
Substituting ̂ 1 in place of  dY dX  we obtain an average elasticity of the form
x x
 P  ˆ . ˆ1 ............................................... (2.25)
yˆ y
In passing note that the least square estimators (i.e. ˆ 0 and ̂ 1 ) are point estimators, that is,
given the sample, each estimator will provide only a single (point) value of the relevant
population parameter.
In conclusion the regression line obtained using the least square estimators has the following
properties
i It passes through the sample mean of Y and X. Recall that we got ˆ 0 = Y  ̂ X which can
be written as Y ˆ0  ˆ1 X

ii. The mean value of the estimated Y = ŷ is equal to the mean value of the actual y. That is
ˆ Y
y
iii. The mean values of the residual is equal to zero

iv. The residuals Û i are uncorrelated with the predicted Yi i.e.,  Yˆ Uˆ
i i 0
v. The residuals Ui are uncorrelated with Xi i.e.,  Uˆ i X i 0
Example: Consider the following table, which is constructed using raw data on X, and Y
where the sample size is 10.
xi yi Yi
Yi Xi YiXi Xi2 Xi2
Xi-X Yi-Y XiYi
70 80 5600 6400 -90 -41 8100 3690
65 100 6500 0000 -70 -46 4900 3220
90 120 0800 14400 -50 -21 2500 1050
95 40 13300 19600 -30 -16 900 480
110 160 17600 25600 -10 -1 100 10
115 180 20700 32400 10 4 100 40
120 200 24000 40000 30 9 900 270
140 220 30800 48400 50 29 2500 1450
155 240 37200 57600 70 44 4900 3080
150 260 39000 67600 90 39 8100 3510
Sum 1110 1700 205500 322000 0 0 33000 16800
Mean 111 170 - - 0 0 0 -

Table 2.2 Hypothetical data on construction expenditure Y and family income X
Note that column 2 to 7 in the above table is constructed using the information given in column
1 and 2.
We can compute ̂ 0 for the above tabulated figure by applying the formula given in (2.17)
that is,
(322,000) (1,110) - (205,500)

ˆ 0 = = 24.4
10(322,000 - (1,700)2
Similarly we can compute ̂ 1 , by using the formula given in (2.20) or (2.27). That is, using
(2.20), we obtain:
10(205,500) - (1700 (1,100)

̂ 1 = = 0.51
10 (322,000) - (1700)2
alternatively, using (2.21) we obtain:
16,800
̂ 1 = = 0.51
330,000
Notice that once we compute ̂ 1 , we can very easily calculate ˆ 0 using (2.19) as follows.
ˆ 0 = 111 - 0.51 (170)

= 24.4
The estimated regression line, therefore, is:
Ŷi = 24.4 + 0.51 Xi.......................... (2.26)
Interpretation of (2.26) reveals that when family income increase by 1 Birr, the estimated
consumption expenditure ̂ 1 amounts to 51 cents.
The value of ̂ 0 = 24.4 (which is the intercept) indicates the average level of consumption
expenditure when family income is zero.
1. Briefly explain the method of ordinary least squares.

________________________________________________________________________
________________________________________________________________________
________________________________________________________________________

2. State and explain the assumptions underlying the method of (ordinary) least squares.
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
3. What is the importance of assuming homoscedastic variance of the error term and no
autocorrelation between the disturbances?
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
4. The following results have been obtained from a sample of 11 observations on the value of
sales (Y) of a firm and corresponding prices (X).
X = 519.18, Y = 217.82
ΣXi2 = 3,134,543, ΣXiYi = 1,296,836
a) Estimate the regression line (function) and interpret the results.
b) Compute the price elasticity of sales using average values of X and Y.
5. From the restricted minimization problem of (2.23), Compute ̂ 1
2.3.2 Properties of Least Squares Estimators: The Gauss-Markov Theorem

Note that there are various econometric methods which we may obtain estimates of the
parameters of economic relationships. To choose among these methods we use the desirable
properties as good criteria.
As noted in the previous discussion, given the assumptions of the classical linear regression
model, the least squares possess some ideal or optimum properties. These properties are
contained in the well-known Gauss-Markov theorem.
To understand this theorem, we need to consider the best linear unbiasedness property of an
estimator. That is, as estimator, say OLS estimator ̂ i is said to be best linear unbiased
estimator (BLUE) of i if the following hold:

i. Linear Estimator. It is linear, that is, a linear function of a random variable, such as
the dependent variable Y in the regression model. Thus, an estimator is linear if it is a
linear function of the sample observations; that is, if it is determined by a linear
combination of the sample data. Given the same observations Y 1, Y2, …, Yn, a linear
estimator will have the form.
K1Y1 + K2Y2 + … + KnYn ............................................. (2.27)
Where the K i ’s are some constants
For example, the sample mean Y is a linear estimator because
Y 
Y i 1 1
Yi  Y1  Y2  ...  Yn 
n

n
 n
ii. Unbiased estimator: an estimator is said to be unbiased if its average or expected value,
E( ̂ 1 ), is equal to the true value, 1

The biase of an estimator is defined as the difference between its expected value and the
true parameter. That is,
Bias = E( ˆ ) – 
Hence, an estimator is unbiased if its bias is zero. That is
E( ˆ ) –  = 0
which, in other words, means
E( ˆ ) = 
~
Consider the following probability density of two estimators –  and ˆ of the true parameter

Figure 2.7 Unbiased and biased estimators (Using the sampling distribution to illustrate bias)

As shown in the diagram an estimator ̂ is said to be an unbiased estimator of . This is
because the true (population) parameter,  is equal to the expected value of the estimate, ˆ .
~ ~
 
But,  has a bias of size E ( ˆ )   indicating the inequality between  and 
Note that the property of unbiasedness does not mean that ˆ = ; it says only that, if we could
undertake repeated sampling an infinite number of times, we would get the correct estimate “on
the average”.
iii. Minimum Variance estimator (or best estimator) An estimator is best when it has the
smallest variance as compared with any other estimate obtained from econometric
methods. Symbolically ˆ is best if

E ˆ  E ( ˆ )  2

< E   E ( )  2
More formally,
~
Var ( ˆ ) < Var (  )
An unbiased estimator with the least variance is known as an efficient estimator.
The following figure shows the sampling distribution of two alternative estimators, ̂ and ̂
Figure 2.8 Sampling distribution of two alternative estimators

Note ̂ is an unbiased estimator of with large variance, whereas *
* is a biased estimator of 
with a small variance.
The figure reveals that ̂ is unbiased but has a large variance, while * ˆ is biased but has a
small variance. Choice between these two alternative estimators may be based on the mean
square error criterion(MSE).
The minimum mean – square-error (MSE) estimator criterion is a combination of the
unbiasedness and the minimum variance properties. An estimator is a minimum MSE estimator
if it has the smallest mean-square-error defined as the expected value of the squared differences
of the estimator around the true population parameter, 
MSE( ̂ ) = E  ˆ   
2
............................. (2.28)
This is equal to the variance of the estimator plus the square of its bias. That is,
MSE( ̂ ) = Var ( ˆ ) + Biase2 (
() .................................. (2.29)
Note that the trade-off between low biase and low variance is formalized by using as a criterion
the minimization of a weighted average of the biase and the variance (i.e., Choosing the
estimator that minimizes the weighted average.
Notice that the property of minimum variance in itself is not important. An estimate may have a
very small variance and a large bias: we have a small variance around the “wrong” mean.
Similarly, the property of unbiasedness by itself is not particularly desirable, unless coupled
with a small variance.
We can prove that the least squares estimators are BLUE provided that the random term U
satisfies some general assumptions, namely that the U has zero mean and constant variance.
This proposition, together with the set of conditions under which it is true, is known as Gauss-
Markov least squares theorem
All Estimators
All Linear Estimators
Linear Unbiased
Estimators
OLS estimator: has the minimum variance
The box above reveals that all estimators are not linear. Furthermore, not all linear estimators
are unbiased. The unbiased linear estimators are a subset of the linear estimators. In the group
of linear unbiased estimator, ˆ has the smallest variance. Hence, OLS possess three properties
namely linear, unbiased and minimum Variance.

2.3.3 Precision or Standard Errors of Least Squares Estimates
From equations (2.17) and (2.20) it is evident that least square estimators are a function of the
sample data. But since the data are likely to change from sample to sample, the estimates will
change ipso facto. Therefore, what is needed is some measure of “reliability” or precision of the
estimators ̂ 0 and ̂ 1 . In statistics the precision of an estimate is measured by its standard

error
The following discussion proves that ̂ 0 and ̂ 1 . are an unbiased estimators of 0 and 1
respectively. Moreover it shows that to arrive at the variance and standard errors of the
estimates, ̂ 0 and ̂ 1 .
Recall from (2.21) that ̂ 1 =

x y i i
2
x i
Expanding this we obtain,
=
 X (Y  Y )
i

 xY  Y  x i
2 2
x i x i
=
xY i i
2
x i
This is because xi = (X - X ) = 0

Thus,
ˆ1  k i Yi
where Ki =
x i
2
x i
Substitute the PRF Yi = 0 + 1Xi + Ui into the above result we obtain

̂ 1 = Ki (0 + 1Xi + Ui)
= 0Ki + 1KiXi + KiUi
=  1 +  Ki U i ..........................................................(2.30)
This is because Ki = 0 and kiXi = 1 (N.B. students are required to prove this by their own)
Now taking the expectation (E) of the above result on both sides, we obtain

E( ̂ 1 ) = 1 + KE(Ui)
= 1
Therefore, ̂ 1 is an unbiased estimator of 1
By definition of variance, we can write

Var( ̂ 1 ) = E ˆ1  E ( ˆ1 )
2

Notice that since E( ̂ 1 ) = 1 it follows that

Var( ̂ 1 ) = E ˆ1  E ( ˆ1 )
2

By rearranging (2.29) the above result can be written as
E(KiUi)2
Var( ̂ 1 ) = E(
= E(k12U12 + k22U22 + … + k2nU2n + 2k1k2U1U2 + … 2kn-1knUn-1Un
Recall that Var (U) = u2 = E[Ui - E(Ui)] 2. This is equal to E(Ui)2, because E(Ui) = 0
Furthermore, E(UiUj) = 0 for i  j. Thus, it follows that
Var ( ̂ 1 ) = u2Ki2
u2
= .................................................... (2.31)
 xi 2
Thus, the standard error (s.e.) of ̂ 1 is given by
u
s.e( ̂ 1 ) = 2 ...................................................
x i
(2.32)
It follows that the variance (and s.e.) of ̂ 0 can be obtained following the same line of
reasoning as above.
Recall from (2.19) that ̂ 0 = Y   1 X .
Moreover, remember that from the PRF we can compute Y 1  ̂ 2 X  U . Substituting this on
the above Y we obtain
= 0 + 1 X  U  ˆ1 X
This means ˆ0 - 0 = 1 X  U  ˆ1 X
 
= - ˆ1   1 X  U

E(0 - 0)2 it follows that
Now, since Var ( ̂ 0 ) = E(
Var ( ̂ 0 ) = E( ̂ 0 - 0)

= E  ˆ1   1 X  U   2
= E  X
2
ˆ 1  1  2
U 2
 
 2 X ˆ1   1 U 
 ................................ (2.33)
2
 
= X 2 E ˆ1  1  E U 2   2 X ˆ1  1 EU  
2
 U 
2  i 
Note that E( U ) =  n 
 
1
=
n 2
2 2
E U 1 U 2 ......... U n
2

1
= 2
n 2
n
2
=
n
Therefore, using this information we can adjust (2.32) to obtain
2 2
u  2
Var( ̂ 0 ) = X 2
 u 0
 xi n

2 1 X 2 
= u
 
n x 2 
 i 
Var( ̂ 0 ) =  u
2 X i
.................................................. (2.34)
2
n x i
Hence, the standard error of ˆ 0 is given by

2
s.e.( ˆ 0 ) = u
X i
................................................. (2.35)
2
n X i
Moreover, the covariance (cov) between ̂ 0 and ̂ 1 describes how ˆ 0 and ̂ 1 are related.
Cov ( ˆ 0 , ̂ 1 ) = E[( ˆ 0 - E(  0 )] [  1 - E( ̂ 1 )]

= E( ˆ 0 -  0 ) ( ̂ 1 -  1 )
Using the information given about E( ̂ 0 - 0) in (2.32), we can rewrite the above result as
follows
Cov ( ˆ 0 , ̂ 1 ) = E(- X ( ̂ 1 -  1 ) + U ) ( ̂ 1 -  1 )]
= E[ U ( ̂ 1 -  1 ) - X ( ̂ 1 -  1 )2]
= 0 -[ X E( ̂ 1 -  1 )2]
Note that E( ̂ 1 - ̂ 1 ) 2 is equal to Var( ̂ 1 ). Hence using (2.30)we obtain

2
u
Cov ( ˆ 0 , ̂ 1 ) = - X 2 ................................................. (2.36)
x i
Note from (2.31) and (2.34) that the formula of the variance of ̂ 0 and ̂ 1 involve the
variance of the random term U,  u2 . However the true variance of Ui cannot be computed since
the values of Ui are not observable. But we may obtain unbiased estimate of  u2 from the
expression
ˆ u2 
 Uˆ i
................................................. (2.37)
n k
where k (which is 2 in this case) stands for the number of parameters and hence n-k represents
the degree of freedom.
Remember that
2 2
U i =  Y i  Y   (Y i   0  1 X i ) 2 .............. (2.38)
Therefore, in calculating Variance of ˆ 0 and ̂ 1 we will make use of ˆ u2

2
in place of  u
since the former is unbiased estimate of the latter
2.4 STATISTICAL TEST OF SIGNIFICANCE AND GOODNESS OF FIT

Thus for we were concerned with the problem of estimating regression coefficients, their
standard errors, and some of their properties. The next stage is to establish criteria for judging
the goodness of the parameter estimator. In this connection we will consider the goodness of fit
of the fitted regression line to a set of data. That is, we shall find how “well” the sample
regression fits the data. This is called the statistical criteria or first-order test for the evaluation

of the parameter estimates. The econometric criteria or second-order tests will be examined in
unit four. The two most commonly used tests in statistical criteria are the Square of the
Correlation Coefficient, r2 and a test based on the Standard Errors of the Estimates
I) The Tests of the Goodness of Fit with r2

After the estimation of the parameters and the determination of the least square regression line,
we need to know how ‘good’ is the fit of this line to the sample observation of Y and X, that is
to say we need to measure the dispersion of the observations around the regression line. It is
clear that if all the observations were to lie on the regression line, we would obtain a “perfect
fit”, but this is rarely the case. Hence, the knowledge of the dispersion of the observation
around the regression line is essential because the closer the observations to the line, the better
the goodness of fit. That is the better is the explanation of the variations of Y by the changes in
the explanatory variables. In general the coefficient of determination r2 is a summary measure
that tells how well the sample regression line fits the data. We will prove that a measure of the
goodness of fit is the square of the correlation coefficient, r
Consider the following diagram
Figure 2.9 Breakdown of the variation of Yi into two components

By fitting the line Yˆ = ˆ 0 + ̂ 1 Xi we try to obtain the explanation of the variation of the
dependent variable Y produced by the changes of the explanatory variable X. However, the fact
that the observations deviate from the estimated line shows that the regression line explains
only a part of the total variation of the dependent variable. A part of the variation, defined as U i
= Yi - Yˆ , remains unexplained. Note the following:

a) We may compute the total variation of the dependent variable by comparing each value of
Y to the mean value Y and adding all the resulting deviations
n n
2
[Total variation in Y] =  yi  Yi  Y
2
  .............................. (2.39)
i 1 i
n
We squred the simple deviation because y
i
i 0
b) In the same way we define the deviation of the regressed (i.e., the estimate from the line)
values of Yˆ ’s , from the mean value, ŷ Yî  Y . This is part of the total variation of Yi
which is explained by the regression line. Thus, the sum of the squares of these deviations
is the total explained by the regression line
n n
2 2
[Explained variation] =  yˆ i   yˆ i  y  ............................... (2.40)
i 1 i
c) Recall that we have defined the error term Ui as the difference, U Yi  Yî . This is Part of
the variation of the dependent variable which is not explained by the regression line and is
attributed to the existence of the disturbance variable U. Thus the sum of the squared
residuals gives the total unexplained variation of the dependent variable Y around its
mean. This is given by
n n
[Unexplained variation] =  Uˆ i  Yi  yˆ  ...................................... (2.41)
2 2
i 1 i 1
Combining (2.39), (2.40) and (2.41) we obtain

2
 Y i
2

 Y   yˆ i  Y    Y i  yˆ 
2
............................................ (2.42)
This shows the total variation in the observed Y values about their mean valuescan be
partitioned in to two parts. One attributed to the regression line and the other to random
forcesbecause not all actual Y observations lie on the fitted line. In other words total sum of
square (TSS) is equal to explained sum of square (ESS) plus residuals sum of squares (RSS).
Symbolically,
TSS = ESS + RSS ....................................................... (2.43)
Or in deviation form it is given by
n n n
 yi2  yˆ i2   Uˆ i2
i 1 i 1 i 1
....................................................... (2.44)
 Total Variation   Explained Variation   Un exp lained Variation  ............... (2.45)
Note that because OLS estimators minimizes the sum of squared residuals (i.e., the unexplained
variation) it automatically maximizes r2. Thus maximization of r2 as a criterion of an estimator,
is formally identical to the least squares criterion. Dividing (2.43) by TSS on both sides, we
obtain
ESS RSS
1 = 
TSS TSS
From (2.43)point of view the above result can be rewritten as
2
1 =
  yˆ  Y  
 Uˆ i
2
........................................... (2.46)
2 2
 Y  Y   Y  Y 
We now define r2 as
2
2   yˆ  Y 
r = 2
 Y  Y 
2
=
 yˆ i
..................................................... (2.47)
2
y
ESS
Notice that (2.47) is nothing but Note Thus, r2 is the square of correlation coefficient r2
TSS
determines the proportion of the variation of Y which is explained by variations in X. For this
reason r2 is also called the coefficient of determination. It measures the proportion of the total
variation in Y explained by the regression model.
Note that by rearranging (2.46) we can come up with
RSS
r2 = 1 –
TSS
 Uˆ i
2
= 1 2 ................................................. (2.48)
 Y  Y 
The relationship between r2 and the slope ̂ 1 indicates that r2 may be computed in various
ways given by the following formulas
r2 = ˆ1
 xy ................................................... (2.49)
2
y

2
= ˆ1
2 x ................................................... (2.50)
2
y
Two properties of r2 may be noted

i) It is a non-negative quantity. This is because we are dealing with sum of squares
ii) Its limit are 0  r2  1.
In this regard,
a) if r2 = 1 means a perfect fit, that is ŷ i = Yi for each i (or alternatively
alternativelyUi2 = 0)
b) if r2 = 0 means there is no relationship between the regression and the regressor
whatsoever. From (2.49) and (2.50) this implies ̂ 1 = 0

Hence the closer r2 to 1, the model becomes a good fit for instance if r 2 = 0.90 this means that
the regression line gives a good fit to the observed data, since this line explains 90 percent of
the total variation of the dependent variable values around their mean. The remaining 10
percent of the total variation of the dependent variable is unaccounted for by the regression line
and is attributed to the factors included in the disturbance variable, U.
Note that if we are working on cross section data, an r 2 value equal to 0.5 may be a good fit.
But for time series data 0.5 may be too low. This means that there is no hard-and-fast rule as to
how much r2 should be. Generally, however, r2 is a good fit the higher the value of it is.
The adjusted coefficient of determination

One major problem with r2 is that it is a non-decreasing one. That is, it increases when
additional variables are included in the model. For example for the model Y i = 0 + 1Xi, let r2
= 0.5. Then when we increase the number of variables to Y i = 0 + 1X1 + 2X2, the r2 will be
greater than 0.5. Hence to arrive at r 2 of higher value, one may be tempted to include irrelevant
variables to the model. That is, a completely non sensical variable can be included in the model
and r2 will increase. On top of the above note that including additional variables reduce the
degree of freedom where the lower the degree of freedom, the less reliable in the model. To
correct for this defect we adjust r2 by taking into account the degrees of freedom, which clearly
decreases as new regression are included in the function. The expression for the adjusted
coefficient of multiple determination is discussed in the next unit.

Example: Consider table 2.2 and the result obtained from it. Calculate Var ( ˆ 0 ), Var ( ̂ 1 ) and
r2.
Solution: Notice that we can construct the value of the estimated Y and error term U. Recall
that we found Ŷi = 24.4 + 0.51Xi. Thus for each Xi of table 2.2, we can develop Ŷi. Once Ŷi is
obtained, subtracting it from each Yi gives the estimated error term. That is, ûi= Yi - Ŷi
The following table summarizes the result.

Yi2 = (Y- Ŷ)2 Ŷi ûi =Yi-Ŷ
1681 65.18 4.82
2116 75.36 -10.36
441 85.54 4.45
256 95.72 -0.72
1 105.91 4.09
16 116.09 -109
81 125.27 -6.27
841 136.45 3.54
1936 145.63 8.36
1521 156.82 -6.82
(sum) 8892 (sum)1107.97
(sum)1107.97 -
2
Recall from (2.36) that Var ( ˆ 0 ) = u

2 X i
2
n x i
Replacing  u2 by ˆ u2 
 Uˆ i
where n = 10 and k = 2 we get ˆ u2 = 42.16
n k
42.16(322,000)
= 41.13
10(33,000)
42.16
= 33,000 0.0013
We can calculate r2 by using (2.47), (2.48), (2.49) or (2.50). for this example we use
(2.49) and (2.50)
Using (2.49), r2 = 0.51 (16,800) = 0.96

8890
(0.51)2 (33,000)
Using (2.50), r2 = = 0.96
8890
II. Testing the Significance of the Parameters Estimates
In addition to r2 testing of the reliability of the estimates ( ˆ 0 , ̂ 1 ) should be done. That is, we
must see whether the estimates are statistically reliable. That is, since ̂ 0 and ̂ 1 are sample
estimates of the parameters 0 and 1, the significance of the parameter estimates should be
seen. Note that given the assumption of normally distributed error term, the distribution of
estimates ̂ 0 and ̂ 1 is also normal. That is, ̂ i ~ N[(E ̂ i ), Var( ̂ i )]

More formally,

2 
X 12 
ˆ 0 ~ N   0 ,  
 n xi2 
 
and
 1 
̂ 1 ~ N   1 ,  u
2 
2
 n x 
i
The standard test of significance is explained through the standard error

test and the t-test.
a) The Standard-Error test of the Least Squares Estimates

The least squares estimates ˆ 0 and ̂ 1 are obtained from a sample of observations of Y and X.
Since sampling errors are inevitable in all estimates, it is necessary to apply tests of
significance in order to measure the size of the error and determine the degree of confidence in
the validity of the estimates.
Among a number of tests in this regard, we will examine the standard error test. This test helps
us to decide whether the estimates ̂ 0 and ̂ 1 are significantly different from zero, i.e.,
whether the sample from which they have been estimated might have come from a population
whose true parameters are zero (
(0 = 0 and/or 1 = 0). Formally we test the null hypothesis.

H0: i = 0 (i.e., X and Y have no relationship)
against the alternative hypothesis:
H1:   0 (i.e., Y and X have a relationship)
This is a two-tailed (or two sided) hypothesis. Very often such a two-sided alternative
hypothesis reflects the fact that we do not have a strong a priori or theoretical expectation about
the direction in which the alternative hypothesis should move from the null hypothesis.
In statistics, when we reject the null hypothesis, we say that our finding is statistically
significant. On the other hand, when we do not reject the null hypothesis, we say that our
finding is not statistically significant.
Some times we have a strong a priori or theoretical expectation (or expectations based on some
previous empirical work) that the alternative hypothesis is one sided or unidirectional rather
than two-sided, as just discussed.
For instance in a consumption – income function C = 0 + 1Y one could postulate that:
H0: 1  0.3
H1: 1 > 0.3
That is, perhaps economic theory or prior empirical work suggests that the marginal propensity
to consume (
(1 ) is greater than 0.3. [Note: Students are strongly advised to refer and grasp the
discussion in unit 7 and 8 of the course Statistics for Economics]
Recall that in order to test a hypothesis of the kind discussed above we need to make use of Z
and t- tests
b) The Z-test of the least squares estimates
Recall what we said in statistics for economics course that the Z-test is applicable only if
a) the population variance is known, or
b) the population variance is unknown, and provided that the sample size is sufficiently
large (n > 30).
In econometric applications the population variance of Y is unknown. However, if we have a
large sample (n > 30) we may still use the standard normal distribution and perform the Z test.
If these conditions cannot be fulfilled, we apply the student’s t-test.
c) The Student’s t-Test

Broadly speaking, a test of significance is a procedure by which sample results are used to
verify the truth or falsity of a null hypothesis.
Recall that in our statistics to economics course we learned the formula which transforms the
value of any variable X into t units as shown below.
Xi  
t=
Sx
with n – 1 degrees of freedom

where  = value of the population mean
S x2 = Sample estimate of the population variance
n = Sample size
Accordingly the variable
î   i ˆ   i
t= = i ............................................... (2.51)
Var  i s.e. i
follows the t-distribution with n – k degrees of freedom, where

̂ i = least square estimate of i
i = hypothesized value of i
Var ̂ i = estimated variance of i (from the regression)
n = Sample size
k = total number of estimated parameters
As we stated earlier in econometrics, the customary form of the null hypothesis is

H0: i = 0
H1: i  0
In this case the t-statistics reduces to
î  0 î
t* =  ....................................... (2.52)
S .e ( ˆ )
i S .e ( î )
were t* refers to the calculated (estimated) t value.

The sample value of t is estimated by dividing the estimate ̂ i by its standard error. This value
is compared to the theoretical (table) values of t that define the critical region in a two-tailed
(for the above case) test, with n – k degrees of freedom. Recall that the critical region depends

of the chosen level of significance (i.e., the value of ). If t* falls in the critical region we reject
the null hypothesis, that is, we accept that the estimate ̂ i is statistically significant
Acceptance
Rejection Region Rejection
region region
-t/2 t/2
Figure 2.10. Acceptance and rejection regions

Note that the critical value  t/2 will be changed to t or - t if the test it is a one-tailed test.
In the language of significance test, a statistic is said to be statistically significant if the value of
the test statistic lies in the critical region. That is, if : -
-t/2  t*  t/2
then accept H0 which implies that ̂ i has insignificant or marginal contribution to the model.
Recall that if it is a one tailed test, the rejection region is found only on one side. Hence, we
reject H0 if t* > t. or t* < -t.
Note that the t-test can be performed in an approximate way by simple inspection. For (n – k) >
8, if the observed t* is greater than 2 (or smaller than –2), we reject the null hypothesis at 5
percent level of significance. If on the other hand, the observed t* is smaller than 2 (but greater
than –2) we accept the null hypothesis at 5% level of significant.
Given (2.50) the sample value of t* would be greater than 2 if the relevant estimate ( ˆ 0 or ̂ 1
) is at least twice its standard deviation. In other words, we reject the null hypothesis if
t* > 2 if ̂ i > 2 s.e. ( ̂ i ) ........................................................... (2.53)
or S( ̂ i ) < ̂ i /2
Example: Suppose that from a sample size n = 20, we estimate the following consumption
function.
Ĉ = 100 + 0.70Y
(75.5) (0.21)

where the figure in brackets are the standard errors of the coefficients β0= 100 and β1=0.70.
Are the estimates significant?
Solution: Since n<30, we use the t-test.
ˆ 0 = 100
s.e ( ˆ 0
*
t = = 75.5 = 1.32
=)
and for β1,
̂ 1 = 0.70
t* = s.e ( ̂ 1 = 0.21 = 3.3
=)
The critical value of t for (n-k =) 18 degrees of freedom is t0.025 = 2.10
Note that for β0 since the calculated value (=1.32) is less than the table value (2.10), we cannot
reject the H0: β0=0. Thus the estimated value β0 is insignificant.
But for ̂ 1 , the calculated value (=3.3) is greater than the table value (2.10), we reject H0: β1=0
indicating that indeed the estimated value of β1 is significant in affecting the relationship
between the two variable.
In conclusion, note that if a researcher gets high r2 value and the estimates have low standard
errors then the result is good. In practice, however, such an ideal situation is rare. Rather we
may have low R2 values and low standard errors or high r 2 values but high standard errors.
There is no agreement among econometricians in this case, so the main issue is whether to
obtain high r2 or lower standard error of the parameter estimates.
In general r2 is more important if the model is to be used for forecasting. Standard error
becomes more important when the purpose of the exercise is the explanation or analysis of
economic phenomena and the estimation of reliable values of the economic relationship.
2.5 THE CONFIDENCE INTERVAL FOR REGRESSION PARAMETER AND PREDICTION
2.5.1 Confidence Interval

Rejection of the null hypothesis does not mean that our estimate ̂ i is the correct estimate of
the true population parameter i. It simply means that our estimate comes from a sample drawn
from a population whose parameter i is different from zero. Note that the true population
parameter is always unknown. In order to define how close to the true parameter lies, we must
construct confidence intervals for the true parameter, in other words we must establish limiting
values around the estimate within which the true parameter is expected to lie with a certain
‘degree of confidence’. In this respect we say that with a given probability the population
parameter will be within the defined confidence interval or confidence limit.
Recall what we have said about constructing confidence interval in the course “Statistics for
Economics”. We said that in confidence interval analysis first we determine the probability
level. This is referred to as the confidence level (or confidence coefficient). Usually the 95%
confidence level is chosen. This means that in repeated sampling the confidence limits,
computed from the sample, would include the true population parameter in 95 percent of the
cases. In the other 5 percent of the cases the population parameter will fall outside the
confidence limit.
The confidence interval can be constructed by the standard normal distribution or the t-
distribution
i) Confidence Interval from the Standard Normal Distribution (Z-distribution)
Recall that the Z-distribution may be employed either if we know the true standard deviation
(of the population) or when we have a large sample (n > 30). This is because, for large samples,
the sample standard deviation is a reasonably good estimate of the unknown population
standard deviation.
The Z-statistics for the regression parameters (i.e., ̂ i ), is given by
ˆ i   i
Z= .................................................. (2.54)
s.e( î )
where s.e = standard error
Our first task is to choose a confidence coefficient say 95 percent. We next look at the standard
normal table and find that the probability of the value of Z lying between –1.96 and 1.96 is
0.95. This may be written as follows

P(-1.96 < Z < 1.96) = 0.95
Substituting (2.53)in place of Z we get
î   i
P(-1.96 < < 1.96) = 0.95
S .e( î )
Rearranging this result we obtain
P[ ̂ i -1.96 s.e ( ̂ i ) < i < ̂ i + 1.96 s.e ( ̂ i )] = 0.95
Thus, the 95 percent confidence interval for i is
̂ i -1.96 S.e ( ̂ i ) < i < ̂ i + 1.96 S.e ( ̂ i )
or i = ̂ i  1.96 (S.e ̂ i )
Example:
Example: given ̂ i = 9 and s.e( ̂ i ) = 2, choosing a value of 95 percent for the confidence
coefficient
Solution: we find the confidence interval to be
i = 9  1.96(2)
= 5.08 < i  12.92
Thus, from our single sample estimate we are 95% confident that the (unknown) true
population parameter will lie between 5.08 and 12.92.
If ̂ i = 8.4 and s.e( ̂ i ) = 2.2, choose the 95 percent for the confidence coefficient
Solution i = 8.4  1.96 (2.2)

8.4 – 1.96(2.2) < i < 8.4 + 1.96(2.2)
= 4.1 < i < 12.7
ii) Confidence interval form the students t-Distribution
The procedure for constructing a confidence interval with the t-distribution is similar to the one
outlined earlier with the main difference that in this case we must take into account the degrees
of freedom.
î   i
Recall that: t= with (n-k) degrees of freedom
 
s..e î

In this regard, if we choose the 95% confidence level we can find the t table value of t = 0.025
with (n-k) degrees freedom. This implies that the probability of t lying between –t 0.025 and t0.025
is 0.95 (with n-k degrees of freedom). Consequently we may write:
P(-t0.025 < t < t0.025) = 0.95
î   i
Substituting t = in the above expression, we find
 
s.e î
î   i
P(-t0.025 < < t0.025) = 0.95
 
s.e î
Rearranging this we obtain
P[ ̂ i - t0.025 (s.e ̂ i ) < i < ̂ i + t0.025 (s.e ̂ i )] = 0.95
Thus the 95 percent confidence interval for , when we use a small sample for its estimation, is
i = ̂ i  t0.025 (s.e ̂ i ) with (n-k) degrees of freedom
Example:
Example: Given the following regression from a sample of 20 observation
Yî = 128.5 + 2.85Xi
(38.2) (0.85)
where the results in the parenthesis are standard errors. Construct the 95% confidence interval
for the intercept and slope.
Solution: Note that n = 20 and k-the number of parameters = 2. From the

t-table the value of t0.025 for (n-k = 18) degrees of freedom is 2.10.
Moreover, s.e( ̂ 0 ) = 38.2 and s.e( ̂ 1 ) = 0.85. Thus, the 95% confidence
interval for the intercept is
128.5  (2.10) (38.2)
= 48.3 < 0 < 208.7
Similarly, the 95% confidence interval for the slope, 1 is given by
̂ 1  t0.025 s.e( ̂ 1 )
= 2.88  (2.10) (0.85)
= 1.09 <  < 4.67

2.5.2 Prediction in the least squares model
One of the major goal of econometric analysis is prediction. Consider the following
consumption function:
Y = 0 + 1X + U
That is, consumption expenditure (Y) is a function of income (X). One of the uses of regression
result is “Prediction” or “forecasting” the future value of Y corresponding to given X. Suppose
on the basis of a sample we obtained the following sample regression.
Yî = 24.45 + 0.509Yi ...................................................(2.55)
where Yî is the estimator of true E(Y i) corresponding to given X. Note that there are two kinds
of predictions in this regard
i) Prediction of the conditional mean value of Y (mean prediction) and
ii) Prediction of an individual Y value corresponding to X0 (individual prediction)
a) Mean prediction
Suppose that we are interested in the prediction of the conditional mean of Y in (2.54)
corresponding to a chosen X, say X0. To fix the idea, assume that X0 = 100 and we want to
predict E(Y/X0 = 100). It can be shown that regression (2.54) provides the point estimate of this
mean prediction as follows:
Yˆ0 = ̂ 0 + ˆ1 X 0
= 24.45 + 0.509 (100)

= 75.36
where Yˆ0 = estimator of E(Y/X0)
Note that since Yˆ0 is an estimator, it is likely to be different from its true value. The difference
between the two values will give some idea about the prediction or forecast error. In order to
see this we need to know the mean and variance of Yˆ0 which is given by:
E( Yˆ0 ) = E( ̂ 0 + ˆ1 X 0 ) = 0 + 1X0
Var( Yˆ0 ) = Var ( ̂ 0 ) + Var( ̂ 1 ) X 02 + 2 Cov ( ˆ 0 ̂ 1 )X0
 1  X 0  X 2 
2
=  
u  ............................................. (2.56)
 n  xi2 

By replacing the unknown  u2 by its unbiased estimator ˆ u2 , (2.56) can be rewritten as
2
1 X  X  
Var( Yˆ0 ) = ˆ u2 
0
 2  .............................................. (2.57)
 n x i 
Recall that ˆ 2 =
RSS

 Uˆ i 2
n k n k
Note that the standard error is given by Var (Yˆ0 )
What we can infer from the above results is that that the variance (and the standard error)
increases the further away the value of X0 is from X . Therefore the variable
Yˆ0  Y
t =
s.e(Yˆ ) 0

ˆ 0 
 ˆ1 X 0    0  1 X 0 
............................................ (2.58)
s.e(Yˆ )
0
Follows the t-distribution with n-2 degrees of freedom

The t-distribution can therefore be used to derive confidence intervals for the true E(Y 0/X0) and
test hypothesis about it in the usual manner. Constructing confidence interval using (2.58) we
get,
Pr[ ̂ 0 + ˆ1 X 0 - t/2 Se( Yˆ0 )  0 + 1X0  ˆ 0 + ˆ1 X 0 + t/2 Se( Yˆ0 )] = 1 –

= Pr[ Yˆ0 - t/2 Se( Yˆ0 )  Y X Yˆ0  t / 2 Se(Yˆ0 ) ] = 1 – 
= Yˆ0  t/2 S.e( Yˆ0 ) ...............................................................

(2.59)
where s.e ( Yˆ0 ) = Var (Yˆ0 )
Now, suppose Var( Yˆ0 ) = 10.4759 where n = 10. We can construct the 95% confidence interval
for true E(Y/X0) = 0 + 1X0.
Note that the table value t0.025 for 8 degrees of freedom is 2.306. Moreover, recall that we
obtained Yˆ = 75.36. Thus, the 95% confidence interval is given by
75.36 – 2.306  10.47   E(Y /X = 100)  75.36 + 2.306 
0 10.47 

= 67.90  E(Y/X = 100)  82.83
Thus, given X = 100, in repeated sampling, 95 out of 100 will include the true mean value of Y
in the interval given above.
b) Individual Prediction
If our interest lies in predicting an individual Y value, Y 0, corresponding to a given X value, say
X0, then the application in forecasting is called individual prediction
Consider the following example
Yˆ0 = 24.45 + 0.50X0
As discussed earlier we can give the point estimate for Yˆ0 to a given value of X0, say 100.
That is, Yˆ0 = 24.45 + 0.509(100)

= 75.36
In order to see the reliability of the above result, we have to obtain the prediction error which is
given by the predicted value less the actual value.
i.e., Yˆ0 - Y0
= ( ̂ 0 + ˆ1 X 0 ) – (
(0 + 1X0 + U0)
= ( ̂ 0 - 0) + ( ̂1 - 1)X0 – U0
Note that E( Yˆ0 - Y) = E[( ˆ 0 - 0) + ( ̂1 - 1)X0 – U0] = 0

Thus, the variance of the prediction error is given by
Var( Yˆ0 - Y0) = E[ Yˆ0 -Y0]2
= E[( ˆ 0 - 0) + ( ̂ 1 - 1)X0 – U0]2

2
 1
1  
 X0  X  

2
 n   X i  X  
2
=  ................................................ (2.60)
By replacing the unknown 2 by its unbiased estimator ˆ 2 we get:
2
ˆ 2
 1
1
 X0  X  
vâr Yˆ1     2

 n   X i  X  
( -Y0) = ............................................... (2.61)
Note that the variance increases the further away the value of X0 is from X
Note that the standard error is given by s.e = Var (Y0  Yˆ0 )
It then follows that the variable

Yˆ  Y0
t= .........................................................
Se(Yˆ  Y0 )
(2.62)
Follows a t-distribution with n-2 degrees of freedom. Therefore, the t-distribution can be used
to draw inferences about the true Y0. Continuing with the above example we see that the point
prediction of Y0 is 75.36. Suppose that its variance is 52.63. Thus, the 95% confidence interval
for Y0 corresponding to X0 = 100 is seen to be
75.36 – 2.306  52.63 Y
0 X 0 = 100  75.36 + 2.306  52.63 
= (58.63  Y0 X 0 = 100  92.09)
Check Your Progress

1. Consider the following estimated regression function
Yˆ0 = 6.7 + 0.25X0
Predict the mean value of Y corresponding to X0 = 60 at 95% confidence level
Use the following additional information: n = 5, Var( Yˆ0 ) = 7.56
2. Consider the following estimated function for n = 12

Yˆ0 = 31.76 + 0.71X0
i) Calculate the point-estimate of Y0 for X0 = 850

ii) Suppose the estimated standard error of the forecasted value of Y is 5.68, construct the
95% confidence interval for the forecasted value of Y (i.e. Y0)
Reporting the results of Regression Analysis

There are various ways of reporting the results of regression analysis. Consider the following
Yˆ1 = 24.45 + 0.509Xi
S.e = (6.413) (0.0357) r2 = 0.96

t = (3.812) (14.24) df = 8
p = (0.0026) (0.000) F1,2 = 202.87

Note that the figures in the first set of parenthesis are the estimated standard errors of the
regression coefficients. The figures in the second set are estimated t-values. The figures in the
third set are the estimated probability. Thus, for (n-k) degrees of freedom the probability of
obtaining a t-value of 3.812 or greater is 0.0026 and the probability of obtaining a t-value of
14.24 or larger is about 0.000. By presenting the p-value of the estimated t coefficients, we can
see at once the exact level of significance of each estimated t-values.
2.6 SUMMARY
 The classical linear regression model is based on a set of assumptions.
 Based on these assumptions, the least-squares estimators take on certain properties

summarized in Gauss-Markov theorem, which states that the estimators are linear,
unbiased and have minimum variance.
 The precision of OLS estimators is measured by their standard errors.
 The overall goodness of fit of the regression model is measured by the coefficient of
determination, r2. It tells what proportion of the variation in the dependent variable, or
regress and is explained by the explanatory variable or the regressor. This r2 lies between 0
and 1; the closer it is to 1, the better is the fit.
 Hypothesis testing answers the question of whether a given finding is compatible with a
stated hypothesis or not. In hypothesis testing the Z-test and t test are used, among others.
 If the model is deemed practically adequate, it may be used for forecasting (predicting)
purpose. In this regard we have mean predication and interval prediction.
2.7 ANSWER TO CHECK YOUR PROGRESS
Answer to check your progress 1
Refer the text for the answers
For questions 1, 2, and 3 refer the text

4. a) Ŷi = 0.45 + 0.41Xi
b) 0.98
5. One way of solving this problem is to use the lagrangean function. Recall the concept of
constrained optimization process from Calculus for Economics Course. The lagrangean
function associated with the problem is developed as follows

L  Y  ˆ 0  ˆ1 X  2
  0
where  is the lagrangean multiplier. We minimize the function with respect to ̂ 0 ̂ 1 and 
L
 0

=  2 Y  ˆ 0  ˆ1 X   0  ------------------------- a
L
 1

=  2 X Y  ˆ 0  ˆ1 X 0  ------------------------ b
L
 ˆ 0 0 ------------------------- c

Substituting (c) into (b) and re-arranging we obtain

 2 X Y  ˆ1 X 0 
ˆ1 
 XY
X2
 Refer the text for questions number 1 and 2.
 For Question No. 3
a) Ŷi = 2.69 - 0.48 Xi
b) Var ( ̂ 0 ) = 0.01, S.e ( ̂ 0 ) = 0.12 , Var ( ̂ 1 )= 0.013, S.e ( ̂ 1 )= 0.01
c) r2 = 0.66
For no. 1
First we need to have the point estimate of Yˆ0 given X0 = 60 as follows
Yˆ0 = 6.7 + 0.25(60) = 21.7

Note that the table value of t0.025 for (5-2 = 3) degree of freedom is 3.182. Hence, the 95%
confidence interval for Y is given by
21.7 – 3.182  7.56   E(Y0/X = 60)  21.7 + 3.182  7.56 
= 21.7  3.182 7.56
= 12.95, 30.45
Thus, given X = 60 in repeated sampling, 95 out of 100 cases will include the true mean value
of Y in the interval given above.
For no.2
This is individual prediction problem
i) For X0 = 850
 Yˆ0 = 31.76 + 0.71(850)
= 635.2
ii) Note that s.e ( Yˆ0 ) = 5.68
The value of t0.025 for 10 degrees of freedom is 2.23. Hence, the 95% confidence interval is
given by
Y0 = 635.2  2.23 (5.68)
= 622.3 < Y0 < 647.67
So we are 95% confident that the forecasted value of Y(= Y 0) will lie between 622.3 and
647.67.
2.8 MODEL EXAMINATION
Using the information below, answer questions 1 to 5.
The following table includes GDP(X) and the demand for food (Y) for a certain country over
ten year period.
Year 1980 81 82 83 84 85 86 87 88 89
Y 6 7 8 10 8 9 10 9 11 10
X 50 52 55 59 57 58 62 65 68 70

1. Estimate the food function Y = β0+ β1X+ u and interpret your result.
2. Calculate elasticity of Y with respect to X at their mean value and interpret your result.
3. Compute r2 and find the explained and unexplained variation in the food expenditure.
4. Compute the standard error of the regression estimates and conduct tests of significance at
the 5% significant level.
5. Find the 95% confidence interval for the population parameter (β0 and β1)
For Question 6 to 7 use the following regression result.
Ŷi = 31.76 + 0.71 Xi
s.e. (5.39) (0.01)
r2 = 0.99 , ˆ u2 . = 285.61
6. for X0 = 850 obtain an estimate of Ŷ.
7. Construct a 95% confidence interval for the result you obtained in (6). [Hint: use individual
prediction approach]
8. Given Y = f(x) ΣX2 = 40,000 ΣX = 160 n = 20
ΣY2 = 50,000 ΣY = 200 ̂ 1 = 0.8
i) Assess the goodness of fit
ii) Test the significance of β1 at 5% level.

9. A sample of 20 observations corresponding to the regression model.
Yi =α +
+Xi+Ei
Where Ei is normal with zero mean and unknown variance

 u2 , gave the following data:
Yi = 21.9 (Yi- Y )2 = 86.9
(Xi- X )(Y
)(Yi- Y ) =106.4 Xi=186.2 Xi- X )2 =215.4
(Xi-
a) Estimate α and  and calculate the standard errors of your estimate.

b) Test the hypothesis that Y and X are positively related.

c) Estimate the value of Y corresponding to a value X fixed at X=10 and find its 95%
confidence interval.
UNIT 3: MULTIPLE LINEAR REGRESSION
Contents
3.1 Introduction
3.2 Specification of the Model
3.3 Assumptions
3.4 Estimation
3.5 The Coefficient of Multiple Determination
3.6 Test of Significance in Multiple Regression
3.7 Forecasting Based on Multiple Regression
3.8 The Method of Maximum Livelihood (ML)
3.9 Summary
3.11 References
3.12 Model Examination Question
The purpose of this unit is to introduce you with the concept of multiple linear regression
model and show how the method of OLS can be extended to estimate the parameters of such
models.

After covering this unit you will be able to:
 understand the specification of multiple linear regression model
 estimate the parameters of the model
 understand the meaning of partial correlation and multiple coefficient of determination
 undertake test of significance in multiple regression
 make forecasting based on multiple regression
3.1 INTRODUCTION
We have studied the two-variable model extensively in the previous unit. But in economics you
hardly found that one variable is affected by only one explanatory variable. For example, the
demand for a commodity is dependent on price of the same commodity, price of other
competing or complementary goods, income of the consumer, number of consumers in the
market etc. Hence the two variable model is often inadequate in practical works. Therefore, we
need to discuss multiple regression models. The multiple linear regression is entirely concerned
with the relationship between a dependent variable (Y) and two or more explanatory variables
(X1, X2, …, Xn).
3.2 SPECIFICATION OF THE MODEL
Let us start our discussion with the simplest multiple regression model i.e., model with two
Y = f(X1, X2)
Example:
Example: Demand for a commodity may be influenced not only by the price of the commodity
but by the consumers income.
Since the theory does not specify the mathematical form of the demand function, we assume
the relationship between Y, X1, and X2 is linear. Hence we may write the three variable
Population Regression Function (PRF) as follows:
Yi = o + 1X1i + 2X2i +Ui

Where Y is the quantity demanded
X1 and X2 are the price and income respectively
o is the intercept term
1 is the coefficient of X1 and its expected sign is negative
(Remember the law of demand)
2 is the coefficient of X2 and its expected sign is positive assuming that the good is a
normal good.
The coefficients 1 and 2 are called the partial regression coefficients. We will discuss the
meaning of these coefficients later.
3.3 ASSUMPTIONS
To complete the specification of our simple model we need some assumptions about the
random variable U. These assumptions are the same as those assumptions already explained in
the two-variables model in unit 2.
Assumptions of the model

1. Zero mean value of Ui
The random variable U has a zero mean value for each Xi
E(Ui/X1i, X2i) = 0 for each i.
2. Homoscedasticity
The variance of each Ui is the same for all the Xi values
Var (Ui) = E(Ui2) =  u2
3. Normality
The values of each Ui are normally distributed
Ui ~ N(0,  u2 )
4. No serial correlation (serial independence of the U’s)
The values of Ui (corresponding to Xi) are independent from the values of any other U j
(corresponding to Xj).
Cov (Ui, Uj) = 0 for i  j
5. Independence of Ui and Xi

Every disturbance term Ui is independent of the explanatory variables. That is there is zero
covariance between Ui and each X variables.
Cov(Ui, X1i) = Cov (Ui, X2i) = 0
Or E(Ui X1i) = E(Ui X2i) = 0
Here the values of the X’s are a set of fixed numbers in all hypothetical samples (Refer the
assumptions of OLS in unit 2)
6. No collinearity between the X variables (No multicollinearity) The explanatory variables
are not perfectly linearly correlated. There is no exact linear relationships between X 1 and
X2.
7. Correct specification of the model

The model has no specification error in that all the important explanatory variables appear
explicitly in the function and the mathematical form is correctly defined.
The rationale for the above assumptions is the same as unit 2.
3.4 ESTIMATION
We have specified our model in the previous subsection. We have also stated the assumptions
required in subsection 3.3. Now let us have sample observations on Y, X 1i, and X2i and obtain
estimates of the true parameters b0, b1 and b2
Yi X1i X2i
Y1 X11 X21
Y2 X12 X22
Y3 X13 X23
  
Yn X1n X2n
The sample regression function (SRF) can be written as
  
Yi =  o   1 X 1i   2 X 2i  U i
  
Where  o ,  1 and  2 are estimates of the true parameters  o ,  1 and  2
U i is the residual term.
But since U i is un observable the above equation becomes

Yî =  o   1 X 1i   2 X 2i is the estimated Regression line.
  
As discussed in unit 2, the estimates will be obtained by choosing the values of the unknown
parameters that will minimize the sum of squares of the residuals. (OLS requires the Ui2 be as
small as possible). Symbolically,
n
  
Min Ui2 = (Yi – Ŷi )2 = 
i
(Yi -  o   1 X 1i   2 X 2i )2
A necessary condition for a minimum value is that the partial derivatives of the above
  
expression with respect to the unknowns (i.e.  o ,  1 and  2 ) should be set to zero.
   2
 
  Yi   0   1 X 1i   2 X 2i 
  0

 0
   2
 
  Yi   0   1 X 1i   2 X 2i 
  0

 1
   2
 
  Yi   0   1 X 1i   2 X 2i 
  0

 2
After differentiating, we get the following normal equations:

  
Yi = n  o +  1 X1i +  2 X2i
  
X1iYi =  o X1i +  1 X21i +  2 X1iX2i
  
X2iYi =  o X2i +  1  X1iX2i +  2 X22i
  
After solving the above normal equations we can obtain values for  o ,  1 and  2
  
 o = Y   1 X1   2 X 2

 x y  x    x y  x
1i i
2
2i 2i i 1i x 2i 
1 = 2
 x  x    x x  2
1i
2
2i 1i 2i

 x 2i y i   x12i  
 x y  x 1i i 1i x 2i 
 2= 2
 x  x    x x 
2
1i
2
2i 1i 2i

where the variables x and y are in deviation forms
i.e., yi = Yi - Y , x1i = X1i - X 1 , x2i = X2i - X 2
  
Note:
Note: The values for the parameter estimates (  o ,  1 and  2 ) can also be obtained by using
other methods (ex. crammer’s rule).
The Mean and Variance of the Parameter Estimates

The mean of the estimates of the parameters in the three-variable model is derived in the same
way as in the two-variable model.
  
The mean of  o ,  1 and  2 .
  
E(  o ) =  o E(  1 ) =  1 E(  2 ) =  2
The estimates are unbiased estimates of the true parameters of the relationship between Y, X 1
and X2. The means expected value of the estimates is the true parameter itself.
  
The variance of  o ,  1 and  2 .
The variances are obtained by using the following formulae

 1 X 12 x 2  X 22 x 2  2 X X 
 2  2  1 1 2  x1 x 2
Var(  o ) = ˆ   u 2

n  x1
2
 x 2
2    x1 x 2  
 
2
2
Var(  1 ) = ˆ u
 x 2
2
x x 2
1
2
2   x x 1 2
2
2
Var(  2 ) = ˆ u
 x 1
2
x x 2
1
2
2   x x 1 2
Where ˆ 2

U , k being the total number of parameters that are estimated. In the above
u
n k
case (three-variable model, k = 3)
x1 and x2 are in deviations form.
3.5 THE MULTIPLE COEFFICIENT OF DETERMINATION
In unit 2 we saw the coefficient of determination (r 2) that measures the goodness of fit of the
regression equation. This notion of r2 can be easily extended to regression models containing
more than two variables.

In the three-variable model we would like to know the proportion of the variation in Y
explained by the variables X1 and X2 jointly.
The quantity that gives this information is known as the multiple coefficient of determination. It
is denoted by R2, with subscripts the variables whose relationships is being studies.
Example:
Example: R 2 y . X 1 X 2 - shows the percentage of the total variation of Y explained by the
regression plane, that is, by changes in X1 and X2.
2
 Y  Y
2
R 2
=
 yˆ i

i
y. X1 X 2 2 2
y i  Y i  Y
2
=1–
U i
1 
RSS
2
y i
TSS
where: RSS – residual sum of squares
TSS – total sum of squares
Recall that
 
ŷ i =  x1i +  x ( the variables are in deviation forms)
1 2 2i
yi = ŷ i + Ui
 
Ui2 = (yi - ŷ i )2 = (yi -  1 x1i -  2 x 2i )2
 
or Ui2 = Ui .Ui = Ui(yi -  1 x1i -  2 x 2i )
 
= Ui .yi -  1 Ui .x1i -  2 Ui .x2i
but Ui .x1i = Ui .x2i = 0

Hence Ui 2 = Ui yi
=  (yi - ŷ i )yi since Ui = yi - ŷ i
   
(yi -  1 x1i -  2 x 2i ) = yi2 -  1 x1i yi -  2 x2i yi
yi (y
By substituting the value of Ui 2 in the formula of R2, we get

   
  y i 2   1  x1i y i   2  x 2i y i 
R 2 y. X1 X 2 = 1 –  U i 2  
2
1   2

y i  i y
 
 1  x1i yi   2  x 2i y i
= 2
, where x1i, x2i and yi are in their deviation forms.
 yi
The value of R2 lies between 0 and 1. The higher R2 the greater the percentage of the variation
of Y explained by the regression plane, that is, the better the goodness of fit of the regression
plane to the sample observations. The closer R2 to zero, the worse the fit.
The Adjusted R2
Note that as the number of regressors (explanatory variables) increases the coefficient of
multiple determinations will usually increase. To see this, recall the definition of R2
2
2
R =1–
U i
2
y i
Now yi2 is independent of the number of X variables in the model because it is simply (yi -
Y )2. The residual sum of squares (RSS), Ui2, however depends on the number of explanatory
variables present in the model. It is clear that as the number of X variables increases, Ui2 is
bound to decrease (at least it will not increase), hence R 2 will increase. Therefore, in comparing
two regression models with the same dependent variable but differing number of X variables,
one should be very wary of choosing the model with the highest R 2. An explanatory variable
which is not statistically significant may be retained in the model if one looks at R2 only.
Therefore, to correct for this defect we adjust R 2 by taking into account the degrees of freedom,
which clearly decrease as new repressors are introduced in the function
2
R
2
=1–
U n  k  i
2
 y  n  1 i
 n  1
or R 2 = 1 – (1 – R2)
n k
where k = the number of parameters in the model (including the intercept term)
n = the number of sample observations
R2 = is the unadjusted multiple coefficient of determination

As the number of explanatory variables increases, the adjusted R2 is increasingly less than the
unadjusted R2. The adjusted R2 ( R 2 ) can be negative, although R2 is necessarily non-negative.
In this case its value is taken as zero.
If n is large, R 2 and R2 will not differ much. But with small samples, if the number of
regressors (X’s) is large in relation to the sample observations, R 2 will be much smaller than
R2.
3.6 TEST OF SIGNIFICANCE IN MULTIPLE REGRESSIONS
The principle involved in testing multiple regressions is identical with that of simple regression.
3.6.1 Hypothesis Testing about Individual Partial Regression Coefficients
We can test whether a particular variable X1 or X2 is significant or not holding the other
variable constant. The t test is used to test a hypothesis about any individual partial regression
coefficient. The partial regression coefficient measures the change in the mean value of Y
E(Y/X2,X3), per unit change in X2, holding X3 constant

 1 i
t=  ~ t(n – k) (i = 0, 1, 2, …., k)
S ( i )
This is the observed (or sample) value of the t ratio, which we compare with the theoretical
value of t obtainable from the t-table with n – k degrees of freedom.
The theoretical values of t (at the chosen level of significance) are the critical values that define
the critical region in a two-tail test, with n – k degrees of freedom.
Now let us postulate that

H0:  i = 0
H1:  i  0 or one sided (  i > 0,  i < 0)
The null hypothesis states that, holding X2 constant, X1 has no (linear) influence on y.
If the computed t value exceeds the critical t value at the chosen level of significance, we may

reject the hypothesis; otherwise, we may accept it (  1 is not significant at the chosen level of

significance and hence the corresponding regression does not appear to contribute to the
explanation of the variations in Y).
Look at the following figure
t 2
Assume  = 0.05, = 2.179 for 12 df
Acceptance
region
95%
Critical
Critical
region 2.5%
region (2.5%)
Fig 3.6.1. 95% confidence interval for t


Note that the greater the value of t calculated the stronger is the evidence that  i is significant.
For a number of degrees of freedom higher than 8 the critical value of t (at the 5% level of
significance) for the rejection of the null hypothesis is approximately 2.
3.6.2 Testing the Overall Significance of a Regression

This test aims at finding out whether the explanatory variables (X 1, X2, …Xk) do actually have
any significant influence on the dependent variable. The test of the overall significance of the
regression implies testing the null hypothesis
H0:  1 = 2 = … = k = 0
Against the alternative hypothesis
H1: not all  i ’s are zero.
If the null hypothesis is true, then there is no linear relationship between y and the regressors.
The above joint hypothesis can be tested by the analysis of variance (AOV) technique. The
following table summarizes the idea.
Source of variation Sum of squares Degrees of freedom Mean square

(SS) (Df) (MSS)
Due to regression (ESS)  yˆ 2
i K–1  yˆ 2
k1
2
Due to Residual (RSS) U i
2
N–k U i
 k
Total y 2
i N–1
(Total variation)
Therefore to undertake the test first find the calculated value of F and compare it with the F
tabulated. The calculated value of F can be obtained by using the following formula.
2
F=
 yˆ  k  1  ESS  k  1 follows the F distribution with k – 1 and n – k df.
i
2
 U  n  k  RSS  N  k 
i
where k – 1 refers to degrees of freedom of the numerator

n – k refers to degrees of freedom of the denominator
k – number of parameters estimated
Decision Rule:
Rule: If Fcalculated > Ftabulated (F(k – 1, N – k)), reject H0: otherwise you may accept it,
where F(k – 1, N – k) is the critical F value at the  level of significance and (k – 1) numerator
df and (N – k) denominator df.
Note that there is a relationship between the coefficient of determination R2 and the F test used
in the analysis of variance.
R 2 ( k  1) f(F)
F=
(1  R 2 ( N  k )
5% of area
F
0 1 2 3 4 5
When R2 = 0, F is zero. The larger the R2, the greater the F value. In the limit, when R2 = 1, F is
infinite. Thus the F test, which is a measure of the overall significance of the estimated
regression, is also a test of significance of R 2. Testing the null hypothesis is equivalent to
testing the null hypothesis that (the population) R2 is zero.
The F test expressed in terms of R2 is easy for computation.
3.6.3 The Confidence Interval for  i

This has been discussed in unit 2. The (1 - )100% confidence interval for  i is given by
 
 i  t/2 S(  i ), (i = 0, 1, 2, 3, … k)
Example:
Example: Suppose we have data on wheat yield (y), amount of rainfall (x 2), and amount of
fertilizer applied (X1). It is assumed that the fluctuations in yield can be explained by varying
levels of rainfall and fertilizer.
Table 3.6.1
(1) (2) (3) (4) (5) (6) (7) (8) (9)
Yield Fertilizer Rain fall yi x1i x2i x1i yi x2i yi x1x2
(Y) (X1) (X2)
40 100 10
50 200 20
50 300 10
70 400 30
65 500 20
65 600 20
80 700 30
 420 2800 140
Y = 60 X 1 = 400 X 2 = 20 (means)
  
1. Find the OLS estimators (i.e.,  o ,  1 and  2 )
  
Solutions:
Solutions: The formula for  o ,  1 and  2 are
  
 o = Y   1 X1   2 X 2

 x y  x    x y  x
1i i
2
2i 2i i 1i x 2i  where x’s and y’s
1 = 2 are in deviation
 x  x    x x 
2
1i
2
2i 1i 2i forms

 x 2i y i   x12i  
 x y  x 1i i 1i x 2i 
 2= 2
 x  x    x x 
2
1i
2
2i 1i 2i
Now find the deviations of the observations from their mean values. (Column 4 to 11 in the
above table)
The next step will be to insert the following values (in deviation) in to the above formula

x1i yi = 16500, x2i2 = 400, x2iyi = 600, x1i x2i = 7000, x1i2 = 280,000,
 (16500)(400)  (600)(7000) 6600000  4200000
 1 = ( 280,000)(400)  (7000) 2 112000000  49000000
2400000
= 0.0381
63000000
 (600)(280,000)  (16500)(7000) 168,000,000  115,500,000
 2= (280,000)(400)  (7000) 2

63,000,000
= 0.833

Now  o =
= 60 – (0.0381) (400) – (0.833) (20)

= 28.1
Hence the estimated function is written as follows
Yî = 28.1 + 0.0381X1 + 0.833X2
 
2. Find the variance of  1 ,  2
Solution

ˆ u2 . x 22 
ˆ 2  x 22
Var(  1 ) = 2 , Var (  2 ) = 2
x x 2
1
2
2   x x 1 2
2
x x
1
2
2   x x 
1 2
In order to use the above formula we need to find ˆ u2

2
ˆ u2 = U i
but Ui = Yi - Yˆ
n k
Example. U1 = 40 – 40.24  U12 = (0.24)2 = 0.0576
Example.
U2 = y2 - ŷ 2
U2 = y3 - ŷ 3
: :

Therefore, Ui2 = (Yi - Y )2

2
Y  Yˆ 
i
0.0576
5.6644

21.4286
21.4286
Hence u2 = = 5.3532
7 3
 (5.3572)(400)
Var(  1 ) = = 0.000034
( 280,000)(400)  (7000) 2

S(  1 ) = 0.000034 = 0.0058
 (280,000)
Var(  2 ) = (5.3572) = 0.02381
63,000,000

S(  2 ) = 0.02381 = 0.1543
3. Find R2 (the coefficient of determination)

 
 1  x1i yi   2  x 2i y i
R2 = 2
y i
(0.0381)(16,500)  (0.833)(600)
= = 0.98
1150
Interpretation: 98% of the variation in yield is due to the regression plane (i.e., because of
variation in the amount of fertilizer and rainfall). The model is a good fit.
4. Test (a) the null hypothesis H0:  1 = 0 against the

alternative hypothesis H1:  1  0
 = 0.05

 1  1 0.0381
t=  = 6.5689 – calculated value
S ( 1 )

0.0058
ttabulated = t 0.05
2
(7  3) = 2.78- can be found from the statistical table (t-distribution)
Decision:
Decision: Since tcalculated > ttabulated , we reject H0.
That is  1 is statistically significant. The variable X1, fertilizer significantly affects yield.

(b) H0:  2 =0
H1:  2 0
 = 0.05

 2  2 0.833
tcalculated =  =
0.1543
= 5.3986
S ( 2 )
ttabulated = t 0.05
2
(7  3) = 2.78

Decision:
Decision: Since tcalculated > ttabulated , we reject H0.  2 is statistically significant
5. Construct the 95% confidence interval for  1
t/2(n-k) S(  1 ) <  1 <  1 + t/2(n-k) S(  1 ) ,

   
1 -
0.0381- t0.025(4) (0.0058) <  1 < 0.0381+ t0.025(4) (0.0058)

0.0381- 2.78 (0.0058) <  1 < 0.0381+ 2.78 (0.0058)
0.0219<  1 <0.0542
Interpretation:
Interpretation: The value of the true population parameter  1 will lie between 0.0219 and
0.0542 in 95 out of 100 cases.
  
Note:
Note: The coefficient of X1 and X2 (  1 and  2 ) measures the partial effect. For example  1
measures the rate of change of Y with respect to X1 while X2 is held constant
6. Test the joint significance of the explanatory variables

H0:  1 = 2 = … = k = 0
H1: H0 is not true.
The test statistic is F-statistic
R 2 (k  1) 0.98 (3  1) 0.49
Fcal = 2
  98
(1  R )  N  k  (1  0.98) (7  3) 0.005
Assuming that  = 0.05, F0.95, (2, 4) = 6.94

Decision: we reject H0 since Fcal > Ftab. We accept that the regression is significant: not all  i ’s
are zero.

The following table contains observations on the quantity demanded (y) of a certain
commodity, its price (X1) and consumers’ income (X2).
Quantity Price Income

Demanded
(y) (X1) (X2)
100…………....5…………….1000
75…………..…7………….…..600
80……………..6………….…1200
70………….….6…………..….500
50………….….8…………..….300
65……………..7…………..….400
90……………..5………...….1,300
100………..…..4………….…1100
110………..…..3……………1,300
60……………..9…………...…300
A) Estimate the parameters by OLS

B) Compute the coefficient of multiple determination and interpret the result
 
C) Find the variance of  1 and  2 (and also their standard errors)
D) Conduct tests of significance at 5% level

E) Construct the 95% confidence intervals for the population parameters
3.7 FORECASTING BASED ON MULTIPLE REGRESSION
Let us now turn our attention to the problem of forecasting the value of the dependent variable
for a given set of values of the explanatory variables. Suppose the given values of the
explanatory variables be X01, X02, X03,…, X0k, and let the corresponding value of the dependent
variable be Y0. Now we are interested in forecasting Y0.
For three variable cases, the point forecast can be found as follows:
Yˆ0 =  o   1 X 01   2 X 02
  
Example 1.
1. Consider the example in section 3.6. (Table 3.6.1)

The estimated regression equation is
Yî = 28.1 + 0.0381X1 + 0.833X2.
where Y is amount of yield

X1 amount of fertilizer applied
X2 is the amount of rainfall
The point forecast can be realized by substitution. If the amount of rainfall increases to 50 units
and amount of fertilizer increases to 100 units then find the forecast value of Y.
Yˆ0 = 28.1 + 0.0381(100) + 0.833(50) = 73.56
The yield of wheat might increase to 73.56 units.

A point forecast is of little use. So we need interval estimation. The forecasted value will lie on
the interval (a, b)
Yˆ0  Y0
The 95% confidence interval for Y0 can be given by making use of ~t(n – k)
S Yˆ
0
P(-t/2 < t < t/2) = 1 - 

Yˆ0  Y0
Thus P(-t/2 < < t/2) = 1 - 
S Yˆ  0
P( Yˆ0 -t/2 S Ŷ < Y0 < Yˆ0 + t/2 S Yˆ ) = 1 - 

0 0
 Yˆ0  t/2 S Yˆ 0
where S Yˆ is the standard error of the forecast value and it can be found by using the following
0
formula
S Yˆ = S 1  X 0T ( X T X )  1 X 0
0
2 
where S = U 
U TU
i

Y T Y   T xT Y
n k n k n k
X 0T = [X01, X02, …, X0k]

 n  X1  X 2 
T
 2

X X   X 1  X1  X1 X 2  Row data form
 2

  X 2  X1 X 2  X 2 
Note:
Note: Students need to know some basic concepts on matrix algebra. It is necessary for the
analysis of general multiple linear regression models.
Partial Correlation Coefficients

In the two variable model we have used the simple correlation coefficient, r, as measure of the
degree of linear association between two variables. For three variable cases we can compute
three correlation coefficients: r12 (correlation between y and x2), r13, and r23 – these are called
gross or simple correlation coefficients, or correlation coefficients of zero order. But, for
example, r12 does not likely to reflect the true degree of association between Y and X 2 in the
presence of X3. Therefore, what we need is a correlation coefficient that is independent of the
influence, if any, of X3 on X2 and Y. Such a correlation coefficient can be obtained and is
known appropriately as the partial correlation coefficient. Conceptually, it is similar to the
partial regression coefficient.
r12.3 = partial correlation coefficient between Y and X2, holding X3 constant
r13.2 = partial correlation coefficient between Y and X3, holding X2 constant
r23.1 = partial correlation coefficient between X2 and X3, holding Y constant
Now we can compute the partial correlations from the simple or zero order, correlation
coefficients as follows.
r12  r13 r23
r12.3 =
(1  r132 )(1  r232 )
r12  r12 r23

r13.2 =
(1  r122 )(1  r232 )

r23  r12 r13
r23.1 =
(1  r122 )(1  r132 )
Note:
Note: By order we mean the number of secondary subscripts
Interpretation:
Interpretation: Example r12.3 – holding X3 constant, there is a positive or negative association
between Y and X2.
3.8 THE METHOD OF MAXIMUM LIKELIHOOD (ML)
A method of point estimation with some stronger theoretical properties than the method of OLS
in the method of maximum likelihood (ML)
Assume that in the two-variable model

Yi = o + 1X1i + Ui the Yi are normally and independently distributed with mean = o
+ 1X1i and variance = 2
As a result, the joint probability density function of Y 1, Y2, …, Yn given the preceding mean
and variance, can be written as
f(Y1, Y2, …, Yn/ o + 1Xi , 2)
But in view of the independence of the Y’s, this joint probability density function can be
written as a product of n individual density functions as
f(Y1, Y2, …, Yn/ o + 1Xi , 2) = f(Y1/ o + 1Xi , 2) f(Y2/o + 1Xi , 2)…
f(Yn/o + 1Xi, 2) …………………………….(1)
1  1 Yi   0   1 X i  2 
where f(Yi) = exp  ………………(2)
 2  2 2 
Which is the density function of a normally distributed variable with the given mean and
variance.
Substituting (2) for each Yi in to (1) gives
1   1 Y   0   1 X i  2 
f(Y1, Y2, …, Yn/ o + 1Xi,  ) =2
n
exp   i  …………….(3)
n  2   2 2 
If Y1, Y2, …. Yn are known or given, but o, 1 and 2 are not known, the function in (3) is
called a likelihood function, denoted by LF( o, 1, 2) , and written as

1   1 Y   0   1 X i  2 
LF ( 2
(o, 1,  ) = n
exp   i  ……………………...(4)
n  2   2 2 
The method of maximum likelihood, as the name indicates, consists in estimating the unknown
parameters in such a manner that the probability of observing the given Y’s is as high (or
maximum) as possible. Therefore, we have to find the maximum of the function (4).
Using your knowledge of deferential calculus
n 1 (Yi   0   1 X i ) 2
lnLF = nln
nln - ln(2
ln(2) -  ……………(5)
2 2 2
 n 2n 1 (Yi   0   1 X i ) 2
= 2
ln
ln - ln(2
ln(2) -  ………………(6)
2 2 2
Differentiating partially with respect to o, 1, and 2, setting the result to zero we obtain
 ln LF 1
 2 (Yi – o – 1Xi) (-1) = 0 ………………………(7)
 0 
 ln LF 1
 2 (Yi – o– 1Xi) (-Xi) = 0 ………………………(8)
 1 
 ln LF  n 1 2
4 (Yi – o – 1Xi) = 0 …………………..(9)
2
 2 
 2 2
~ ~
The above equations can be rewritten as (letting  0 ,  1 and ~ 2 denote the ML estimators)
1 ~ ~
~ 2 (Yi -  0 -  1 Xi) = 0 ……………………………………(10)

1 ~ ~
~ 2 (Yi -  0 -  1 Xi)Xi = 0 …………………………………(11)

 n 1 ~ ~
~ 2
 ~ 4 (Yi -  -  Xi)2 = 0 ………………………….(12)
2 2 0 1
~ ~
After simplifying Yi = n  0 +  1 Xi ………………………………(13)
~ ~
YiXi =  0 Xi +  1 Xi2…………………………….(14)
Which are precisely the normal equations of the least squares theory obtained in unit 2.
~
Therefore, the ML estimators, the  ' s are the same as the OLS estimators.
1 ~ ~ 1 ~ ~
From equation (12) ~ 2 = (Yi -  0 -  1 Xi)2 = (Yi -  0 -  1 Xi)2
n n

1
=  Uˆ i2
n
 1 
It is obvious that the ML estimator ~ 2 differs from the OLS estimator  =  ˆ2
2
( n  2)  U i ,
 
which was shown to be an unbiased estimator of 2. Thus, the ML estimator of 2 is biased. The
magnitude of this bias can be easily determined as follows:
1  n 2 2 2
E( ~ 2 ) = ( Uˆ i2 ) =   = 2 –  2
n  n  n
Which shows that ~ 2 is biased downward (i.e., it underestimates the true 2) in small samples.
 2 2
But notice that as n, the sample size, increases indefinitely, the second term above    , the
 n 
bias factor, tends to be zero. Therefore, asymptotically (i.e., in a very large sample), ~ 2 is
unbiased too.
3.9 SUMMARY
Multiple linear regression Model

Population Regression Function (PRF) as follows:
Yi = o + 1X1i + 2X2i +Ui
The sample regression function (SRF) can be written as
  
Yi =  o   1 X 1i   2 X 2i  U i
Assumptions of the model

1. Zero mean value of Ui: E(Ui/X1i, X2i) = 0 for each i.
2. Homoscedasticity ; Var (Ui) = E(Ui2) =  u2
3. Normality; Ui ~ N(0,  u2 )
4. No serial correlation (serial independence of the U’s); Cov (Ui, Uj) = 0 for i  j
5. Independence of Ui and Xi; Cov(Ui, X1i) = Cov (Ui, X2i) = 0
6. No collinearity between the X variables (No multicollinearity)
7. Correct specification of the model
Formulas for the parameters
  
 o = Y   1 X1   2 X 2


 x y  x    x y  x
1i i
2
2i 2i i 1i x 2i 
1 = 2
 x  x    x x  2
1i
2
2i 1i 2i

 x 2i y i   x12i  
 x y  x 1i i 1i x 2i 
 2= 2
 x  x    x x 
2
1i
2
2i 1i 2i
where the variables x and y are in deviation forms

  
The mean of  o ,  1 and  2 .
  
E(  o ) =  o E(  1 ) =  1 E(  2 ) =  2
  
The variance of  o ,  1 and  2 .
 1 X 12 x 2  X 22 x 2  2 X X 
 2  2  1 1 2  x1 x 2
Var(  o ) = ˆ   u 2

n x 2
x 2
 1 2  1 2   x x  
 
2
2
Var(  1 ) = ˆ u
 x 2
2
x x 2
1
2
2   x x  1 2
2
2
Var(  2 ) = ˆ u
 x 1
2
x x 2
1
2
2   x x  1 2
U2
Where ˆ u2   , k being the total number of parameters that are estimated.
n k
x1 and x2 are in deviations form.
The multiple coefficient of determination (R2): measures the proportion of the variation in Y
explained by the variables X1 and X2 jointly.
2 2
R 2
=
 yˆ i
2

 Y i  Y
=1–
U i
1 
RSS
y. X1 X 2 2 2 2
y i  Y i  Y y i
TSS
The Adjusted R2
2
R
2
=1–
U n  k  i
or R 2 = 1 – (1 – R2)
 n  1
2
n k
 y  n  1 i

The partial regression coefficient measures the change in the mean value of Y E(Y/X2,X3), per
unit change in X2, holding X3 constant
Hypothesis Testing about Individual Partial Regression Coefficients

 1 i
t=  ~ t(n – k) (i = 0, 1, 2, …., k)
S ( i )
H0:  i = 0
H1:  i  0 or one sided (  i > 0,  i < 0)
The test of the overall significance of the regression

H0:  1 = 2 = … = k = 0
H1: not all  i ’s are zero.
If the null hypothesis is true, then there is no linear relationship between y and the regressors.
Find the calculated value of F and compare it with the F tabulated.

2
Fcal =
 yˆ  k  1  ESS  k  1 follows the F distribution with k – 1 and n – k df.
i
2
 U  n  k  RSS  N  k 
i
Decision Rule:
Rule: If Fcalculated > Ftabulated (F(k – 1, N – k)), reject H0: otherwise you may accept it,
R 2 ( k  1)
F=
(1  R 2 ( N  k )
The F test expressed in terms of R2 is easy for computation.
The (1 - )100% confidence interval for  i is given by

 
 i  t/2 S(  i ), (i = 0, 1, 2, 3, … k)
Forecasting
Point forecast vs interval estimation (the forecasted value will lie on the interval (a, b))
Yˆ0  Y0
The 95% confidence interval for Y0 (forecasted value) can be given by making use of
S Yˆ
0
~t(n – k)
P(-t/2 < t < t/2) = 1 - 

Partial Correlation Coefficients
r12.3 – holding X3 constant, there is a positive or negative association between Y and X2.
The method of maximum likelihood (ML)
The method of maximum likelihood, as the name indicates, consists in estimating the unknown
parameters in such a manner that the probability of observing the given Y’s is as high (or
maximum) as possible.
~
The ML estimators, the  ' s are the same as the OLS estimators.
The ML estimator of 2 is biased.
3.10. ANSWERS TO CHECK YOUR PROGRESS QUESTIONS
Check your progress question 3.1.

 
A) Answer  1 =-7.1882  2 =0.0143

 0 = 80 – (-7.197)(6) – (0.0143) (800) = 111.69
B) Answer R2 = 0.894
Interpretation: The variables X1 and X2 explain 89% of the total variation in Y.
 
C) Answer Var(  1 ) = 6.53 Var (  2 ) = 0.0001
 
S(  1 ) = 2.55, S(  2 ) = 0.01
 
D) Answer  0 and  1 are statistically significant

 2 is not statistically significant
E) Answer (a) –13.22075 <  1 < -1.15925
(b) –0.00965 <  2 < 0.03765
1. The following table shows observations on quantity of oranges sold (y), price in cents
(X1), and advertising expenditures (X2)
Quantity Price Advertising expenditure
(Y) (X1) (X2)
55 100 5.5
70 90 6.3

90 80 7.2
100 70 7.0
90 70 6.3
105 70 7.35
80 70 5.6
110 65 7.15
125 60 7.50
115 60 6.90
130 55 7.15
130 50 6.50
a) Find the least squares estimates of the regression coefficients

b) What is the economic meaning of the coefficients
c) Calculate the value of R2
d) Find the standard errors of the estimated parameters
e) Construct 95 percent confidence intervals for the population parameters
2. The following results were obtained from a sample of 12 firms on their output (Y), labor
input (X1) and capital input (X2), measured in arbitrary units.
Y = 753 Y2 = 48,139 YX1 = 40, 830
X1 = 643 X12 = 34,843 YX2 = 6,796
X2 = 106 X22 = 976 X1X2 = 5,779
a) Find the least squares equation of Y on X1 and X2. What is the economic meaning of
your coefficients?
b) Given the following sample values of output (Y), compute the standard errors of the
estimates and test their statistical significance.
Firms A B C D E F G H I J K L
Output 64 71 53 67 55 58 77 57 56 51 76 68
c) Find the multiple correlation coefficient and the unexplained variation in output
d) Construct 99 percent confidence intervals for the population parameters.
3. From the following data estimate the partial regression coefficients, their standard errors, and
the adjusted and unadjusted R2 values.
Y =367.693 X2 = 402.760 X3 = 8.0


(Yi - Y )2 = 66042.269 (X2i- X 2 )2 = 84855.096

 
(X3i- X 3 )2 =280.000 (Yi - Y ) (X2i- X 2 ) =74778.346
  
(Yi - Y ) (X3i- X 3 ) = 4250.900 (X2i- X 2 )(X3i- X 3 ) = 4796.000
n =15
4. The following represents the true relationship between the independent variables
X1, X2, X3, and the dependent variable Y
Yi= bo+b1X1i+b2X2i+b3X3i+Ui
Where Y=Quantity demanded

X1=price of the commodity.
X2=price of the other commodities
X3=Income of the consumer
Ui=disturbance term
i) Is the above relation exact? Why?

ii) What is the economic meaning of the coefficients?
iii) What will be the expected sign of the coefficients?
iv) What will be the expected size (magnitude) of the coefficients?
5. There are occasions when the two variable linear regression model assumes the following
form:
Yi=
Yi=Xi+Ei
Where  is the parameter and E is the disturbance term. In this model the intercept
term is zero. The model is therefore known as regression through the origin.
For this model show that
 X iYi

i)The least squares estimator  2
=
 Xi

  u2 2
 ei
ii) Var
( )  where
 u2 is estimated by
 u2  ,
 X i2 n 1
ei represents the residuals.
6 A sample of 20 observations corresponding to the regression model.

Yi =α +
+Xi+Ei
Where Ei is normal with zero mean and unknown variance

 u2 , gave the following data:
Yi = 21.9 (Yi- Y )2 = 86.9
(Xi- X )(Y
)(Yi- Y ) =106.4 Xi=186.2 Xi- X )2 =215.4
(Xi-
d) Estimate α and  and calculate the standard errors of your estimate.
e) Test the hypothesis that Y and X are positively related.
f) Estimate the value of Y corresponding to a value X fixed at X=10 and find its 95%
confidence interval.
7 The following data is based on the consumption expenditure and incomes for 15 households
at a particular month.
X=1922 Y=1823 XY=1,838,678
X2=2,541,333 Y2=1,789,940
i) Obtain marginal propensity to consume and level of autonomous consumption
ii) Construct a 95% confidence interval for the coefficients
ii) Test the significance of the coefficients at 5% level
iv) Comment on goodness of fit.
8 The quantity demanded of commodity is assumed to be a linear function of its price X. the
following results has been obtained from a sample of 10 observations.
Price in 15 13 12 12 9 7 7 4 6 3
Birr (x)
Quality in 760 775 780 785 790 795 800 810 830 840
kg
Making use of the above information
i) Estimate the linear relationship and interpret the results.

ii) Estimate the standard errors of the regression coefficients.
iii) What percent of the variation in quantity demanded is explained by the regression
line?
iv) Compute the price elasticity of demand at X=12 and Y=780
v) What is the average price elasticity of demand?
vi) Forecast the demand at a price level of 10 birr and set a 95% confidence limit for the
forecasted value.
vii) Test the significance of the regression coefficients.
viii) Conduct tests of significance for r and R2
ix) Present the result of your analysis.
9. Economic theory suggests that consumption expenditure is influenced by the level of

disposable income. An econometrician has gathered the following information X i, the
disposable income and Yi, the consumption expenditure from a sample of 20 observations to
investigate the nature of the relationship.
Xi2 = 3500 Yi2 = 4500
X =7 Y =9 r = 0.7
i) Fit a linear regression of consumption on income and interpret the result.

ii) Compute the marginal propensity to save and estimate the consumption function
iii) Is the estimated consumption function plausible on the basis of a priori theory and
statistical criteria?
iv) What is the expected sign of the intercept coefficient of the consumption? function
v) Estimate the consumption expenditure at the income level of 1000 birr and estimate the
variance of the estimator.
vi) Test the significance of R2.
10. Given a true relationship between X and Y
Yi = bo+ b1Xi +Ui

2
i) Show that Cov(b0,b1) =  X  u under basic assumptions of the linear regression model.
 xi2
ii) Show that the estimated regression line passes through the mean values of X and Y.

iii) Assume that the above relationship is reduced into Yi = bo + Ui assuming the basic
assumptions of the linear regression model obtain the unbiased estimators of the mean
and variance of b0.
10. The following results have been obtained form a sample of 13 observations on quantity
demanded Y of a particular commodity and its corresponding price X
Y(Kg) 780 785 790 795 810 810 820 821 830 835 840 849 870
X(birr) 20 18 16 14 13 12 11 11 10 9 8 7 5
i) Estimate the linear demand function for the commodity and interpret your result.
ii) Compute the standard error of regression line and the coefficients
iii) Compute the price elasticity of demand at mean values.
iv) What is the part of variation in demand for the commodity that remains unexplained
by the regression line?
v) What is the explained proportion? What does it show?
vi) Forecast the quantity demand at price level of 13 Birr. Compare your result with the
value in the data.
11. There is a two-way causation in correlation analysis where as there is a one-way Causation
in regression analysis. Explain
12. Assume that X and Y are perfectly and linearly related in such as manner that all the points
in the scatter diagram would lie exactly on a regression line given by
Y= b0 + b1X . Show that the correlation coefficient (r) is either 1 or –1.
13. Assume that the quantity supplied of a commodity Y is a linear function of its price (X1)
and wage rate of labor used (X2) in the production of the commodity. The sample values are
summarized as follows
Y=1,282 Y2=132,670 X1Y=53,666
X1=545 X21=22,922 X2Y=5,707
X2 =86 X22=617 X1X2=2,568
i) Using OLS estimate the parameters of the model
ii) Estimate the elasticities and interpret your results
iii) Forecast the supply for a particular commodity at X1=32 and X2=10, set a 95%
confidence interval for the forecasted value.
iv) Test the over all significance of the supply function

14. The following table shows the levels of output (Y). labour in put (X1) and capital input (X2)
of 12 firms measured in arbitrary units.
X2=110  X21 =980 YX1=40,834
X1=647 X12=34,843 YX2=6,7100
Y=757 Y2=48,143 X1X2=5,783
i) Estimate the output function Y=bo+b1X1+b2X2+U
ii) What is the economic meaning of the coefficients
iii) Compute the standard errors of the coefficients
iv) Run tests of significance of the coefficients
v) Compute the coefficient of multiple determination and interpret it
vi) Conduct the overall significance test and interpret your result.
15. What are some of the problems in using R2 as a measure of goodness of fit? Compare and
contrast R2 and the corrected R2 on the basis of sample size and parameters to be estimated
from a particular model.
16. Assume that X and Y are perfectly and linearly related in such as manner that all the points
in the scatter diagram would lie exactly on a regression line given by Y= b 0 + b1X. Show
that the correlation coefficient (r) is either 1 or –1.
Unit 4: Violations of Econometric Assumptions (Topics in Multiple

regression)
CONTENT
4.0 Aims and Objective
4.1 Introduction
4.2 Zero Expected Disturbances
4.3 Homoscedasticity
4.4 Autocorreletion
4.5 Multicollinearity
4.6 Summary
4.9 Reference
4.0 Aims and objectives
The aim of this unit is to show the reader what is meant by violation of basic econometric assumption that formed
the basis of the classical linear regression model. After the student have completed this unit he/she will understand:
 the sources of the variation

 consequences of the problem
 the various ways of detecting the problem
 the alternative approaches in solving the problem
4.1 introduction
Recall that in the classical model we have assumed

a) Zero mean of the random term
b) Constant variance of the error term (i.e., the assumption of homoscedasticity)
c) No autocorrelation of the error term
d) Normality of the error term
e) No multicolinearity among the explanatory variable.
It was on the basis of these assumptions that we try to estimate the model, and test the
significance of the model. But the question is what would be the implication if some or all of
these assumptions are violated. That is, if the assumptions are not fulfilled what will be the
outcome? In this unit we will discuss issues in violation of some of the assumptions that are
more important.
4.2 the assumption of zero expected disturbances
This assumption is imposed by the stochastic nature of economic relationships, which

otherwise it would be impossible to estimate with the common rule of mathematics. The
assumption implies that the observations of Y and X must be scattered around the line in a

random way (and hence the estimated line Yˆ = ˆ 0 + ̂ 1 X be a good approximation of the true
line.) This defines the relationship connecting Y and X ‘on the average’. The alternative
possible assumptions are either E(U) > 0 or E(U) < 0. Assume that for some reason the U’s had
not an average value of zero, but tended most of them to be positive. This would imply that the
observation of Y and X would lie above the true line.
It can be shown that by using these observations we would get a bad estimate of the true line. If
the true line lies below or above the observations, the estimated line would be biased
Figure 4.1 E(U) > 0
The above figure shows that the estimated line Yˆ is not a good approximation to the true line,
E(Y)
Note that there is no test for the verification of this assumption because the assumption E(U) =
0 is forced upon us if we are to establish the true relationship. That is, we set E(U) = 0 at the
outset of our estimation procedure. Its plausibility should be examined in each particular case
on a priori grounds. In any econometric application we must be sure that the following things
are fulfilled so as to be safe from violating the assumption of E(U) = 0
i) All the important variables have been included into the function.
ii) There are no systematically positive or systematically negative errors of
measurement in the dependent variable.
4.3 the assumption of homoscedasticity
A) The Nature of Hetroscedasticity

The assumption of homoscedasticity (or constant variance) about the random variable U is that
its probability distribution remains the same over all observations of X, and in particular that
the variance of each Ui is the same for all values of the explanatory variable. Symbolically we
have
Var(U) = E{(Ui – E(U)}2 = E(Ui) = u2, constant
If the above is not satisfied in any particular case, we say that the U’s are hetroscedastic. That is
Var (Ui) = ui2 not constant. The meaning of homoscedasticity is that the variation of each U i
around its zero mean does not depend on the value of X. That is  ui2  f(Xi). Consider the
following diagram.
Figure 4.2 Various forms of hetroscedasticity

Note that if u2 is not constant, but its value depends on X. we may write ui2 = f(Xi). As shown
in the above diagrams there are various forms of hetroscedasticity. For example in figure (c) the
variance of Ui decreases as X increases.
In figure (b) we picture the case of (monotonically) increasing variance of U i’s: as X increases,
so does the variance of U. This is a common form of hetrodcedasticity assumed in econometric
applications. That is, the larger an independent variable, the larger the variance of the
associated disturbance. Various examples can be stated in support of this argument. For
instance, if consumption is a function of the level of income, at higher levels of income (the
independent variable) there is greater scope for the consumer to act on whims and deviate by
larger amounts from the specified consumption relationship. The following diagram depicts this
case.
Cons to Econometrics
Introduction Page 96
Income
Low High
income income
Figure 4.3: Increasing variance of U

Furthermore, suppose we have a cross-section sample of family budget from which we want to
measure the savings function. That means Saving = f(income). In this case the assumption of
constant variance of the U’s is not appropriate, because high-income families show a much
greater variability in their saving behavior than do low income families. Families with high
income tend to stick to a certain standard of living and when their income falls they cut down
their savings rather than their consumption expenditure. But this is not the case in low income
families. Hence, the variance of Ui’s increase as income increases.
Note, however, that the problem of hetroscedasticity is the problem of cross-sectional data
rather than time series data. That is, the problem is more serious on cross section data.
B) Causes of Hetroscedasticity
Hetrodcedasticity can also arise as a result of several cases. The first one is the presence of
outliers (i.e., extreme values compared to the majority of a variable). The inclusion or exclusion
of such an observation, especially if the sample size is small, can substantially alter the results
of regression analysis. With outliers it would be hard to maintain the assumption of
homoscedasticity.
Another source of hetrodcedasticity arises from violating the assumption that the regression
model is correctly specified. Very often what looks like hetroscedasticity may be due to the fact
that some important variables are omitted from the model. In such situation the residuals
obtained from the regression may give the distinct impression that the error variance may not
be constant. But if the omitted variables are included in the model, the impression may
disappear.

In summary we may say that on a priori grounds there are reasons to believe that the
assumption of homoscedasticity may often be violated in practice. It is therefore, important to
examine the consequences of hetroscedaticity.
C) The consequence of Hetrodcedasticity

If the assumption of homoscedastic disturbance is not fulfilled we have the following
consequences:
i) If U is hetroscedastic, the OLS estimates do not have the minimum variance property in
the class of unbiased estimators; that is, they are inefficient in small samples.
Furthermore, they are inefficient in large samples
ii) The coefficient estimates would still be statistically unbiased. That is the expected value
of the ̂ ' s will equal to the true parameters, E( ̂ i ) = I

iii) The prediction (of Y for a given value of X) would be inefficient because of high
variance. This is because the variance of the prediction includes the variances of U and of
the parameter estimates, which are not minimum due to the incidence of
hetroscedasticity.
In any case how does one detect whether the problem really exists.
D) Detecting the problem of Hetroscedasticity

The usual first step in attacking this problem is to determine whether or not heterodcedasticity
actually exists. There are several tests for this which are based on the examination of the OLS
residuals (i.e., Ui). In this respect the rules are
i) Visual Inspection of Residuals

This is a post moreteum approach in the sense that there are no a priori information as the
existence of hetroscdasticity. So this approach examines whether the error term depicts some
systematic pattern or not. To this the residuals are plotted on graph against the dependent or
independent variable to which it is suspected the disturbance variance is related. Although Uˆ i2

are not the same thing as Ui2, they can be used as proxies especially if the sample size is
sufficiently large.

The following figure shows the plot of Uˆ i2 against Yî , the estimated Yi from the regression
line or X, the idea being to find out whether the estimated mean value of Y or X is
systematically related to the squared residual.
Figure 4.4 Relationship between Uˆ i2 and X
In figure (a) we see that there is no systematic relationship between the two variables,
suggesting that perhaps no hetrodcedasticity is present in the data. Figure (b) and (c) however,
suggests a linear relationship between the two variables particularly figure (c) reveals or
suggests that the hetroscedastic variance may be proportional to the value of Y or X. Figure (d)
and (e) indicate a quadratic relationship between Uˆ i2 and Yî or X. This knowledge may help
us in transforming our data in such a manner that in the regression on the transformed data the
variance of the disturbance is homoscedastic. Note that this visual inspection method is also
known as the informal method. The following tests follow formal method.
ii) Park Test

Park formalizes the graphical method by suggesting that i2 is some function of the explanatory
variable Xi
The functional form he suggested was
Var(Ui) = i2 = 2XieVi .......................................(4.1)
Where Vi is the stochastic disturbance form
The logarithmic form of (4.1) may be written as

lni2 = ln2 + lnXi + Vi .......................................(4.2)
Since i2 is generally not known, park suggests using Uˆ i2 as a proxy and running the following
regression
ln Uˆ i2 = ln2 + lnXi + Vi
= α + lnXi + Vi .......................................(4.3)
If  turns out to be statistically significant, it would suggest that hetrodcedasticity is present in
the data. If it turns out to be insignificant, we may accept the assumption of homoscedasticity.
The park test is thus a two-stage procedure. In the first stage we run the OLS regression
disregarding the hetroscedasticity question. We obtain Û i from this regression, and then in the
second stage we run the regression stated in (4.3)
Example:
Example: Consider a relationship between Compensation (Y) and Productivity (X). To
illustrate the Park approach, the following regression function is used
Yi = βo + β1Xi + Ui .......................................(4.4)
Suppose data on Y and X is used to come up with the following result
Ŷ = 1992.34 + 0.23 Xi
S.e= (936.48) (0.09)
t = (2.13) (2.33) r2 = 0.44
Suppose that the residuals obtained from the above regression, were regressed on Xi as
suggested in (4.3), giving the following results.
ln Uˆ i22 = 35.82 - 2.81 lnXi

S.e = (38.32) (4.22)
t = (0.93) (-0.67) r2 = 0.46
As shown in the above result (t value) the coefficient of lnXi is not significant. That is, there is
no statistically significant relationship between the two variables. Following the Park test, one
may conclude that there is no hetroscedasticity in the error variance.

Although empirically appealing, the park test has some problems. For instance the error term V i
entering into the (4.3) may not satisfy the OLS assumptions and may itself be hetrodcedastic.
Nonetheless, as strictly exploratory method, one may use the park test
iii) Spearman’s Rank Correlation Test

This test requires to calculate rank correlation where its coefficient can be used to defect
hetrodcedasticity. Note that the rank correlation coefficient is given by
  d i2 
rs = 1 – 6  2  .............................................(4.5)
 n(n  1) 
Where di = difference in the rank assigned to two different characteristics of the i th individual or
phenomenon and n = number of individuals or phenomena ranked. The steps required in this
test is stated as follows
Assume Yi = 0 + 1Xi + Ui
Step 1. Fit the regression to the data on Y and X and obtain the residuals Û i
Step 2. Ignoring the sign of Û i , that is, taking their absolute value | Û i |, rank both | Û i
| and Xi (or Ŷi ) according to an ascending or descending order and compute the
spearman’s rank correlation coefficient given previously, (4.5).
Step 3. Assuming that the population rank correlation coefficient s is zero and n > 8,
the significance of the sample rs can be tested by the t test as follows:
rs n  2
t= ........................................... (4.6)
1  rs2
with df = n – 2
If the computed t value exceeds the critical t value, we may accept the hypothesis of
hetrodcedasticity; otherwise we may reject it. If the regression model involves more than one X
variable, rs can be computed between | Û i | and each of the X variable separately and can be
tested for statistical significance by the t-test given.
Example To illustrate the rank correlation test consider the regression Yi = β0 + β1Xi. Suppose
10 observations are used to this equation. The following table make use of the rank correlation

approach to test the hypothesis of hetroscedasticity. Notice that column 6 and 7 put rank of |Ûi|
and Xi in an ascending order.
Table 4.1 Rank Correlation Test of Hetroscedasticity
d (difference
Observ- Rank of
Y X Ŷ Û=(Y- Ŷ) Rank of Xi between the two d2
ation Ûi
ranking)
1 12.4 12.1 11.37 1.03 9 4 5 25
2 14.4 21.4 15.64 1.24 10 9 1 1
3 14.6 18.4 14.4 0.20 4 7 -3 9
4 16 21.7 15.78 0.22 5 10 -5 25
5 11.3 12.5 11.56 0.26 6 5 1 1
6 10.0 10.4 10.59 0.59 7 2 5 25
7 16.2 20.8 15.37 0.83 8 8 0 0
8 10.4 10.2 10.50 0.10 3 1 2 4
9 13.1 16.0 13.16 0.06 2 6 -4 16
10 11.3 12.0 11.33 0.03 1 3 -2 4
TOTAL 0 110
Applying formula (4.5) we obtain

 110 
rs = 1–6  
 10(100  1) 
= 0.33
Applying the t-test given in (4.6), we obtain:

(0.33) 8
t =
1  0.11
= 0.99
Note that for 8 (=10-2) df this t-value is not significant even at the 10% level of significance.
Thus, there is no evidence of systematic relationship between the explanatory variable and the
absolute value of the residuals, which might suggest that there is no hetroscedasticity.
iv) The Gold feld – Quandt Test

This test is applicable to large samples. The observation must be at least twice as many as the
parameters to be estimated. The test assumes normality and serially independent disturbance
term, Ui’s. Consider the following:
Yi = 0 + 1X1i + 2X2i + …
…kXki + Ui

Furthermore, suppose that the test is to assess whether there exists hetroscedasticity or not. The
hypothesis to be tested is
H0: Ui’s are homoscedastic
H1: U’s are hetrodcedastic (with increasing variance)
To test this, Goldfeld-Quandent perform the following steps
Step I: the observations are ordered according to the magnitude of the independent variable
thought to be related to the variance of the disturbances.
Step II: a certain number of central observations (represented by c) are omitted, leaving two
equal-sized groups of observations, one group corresponding to low values of the chosen
independent variable and the other group corresponding to high values. Note that the
observations are omitted to sharpen or accentuate the difference between the small variance and
the large variance group.
Step III. we fit separate regression to each sub-sample, and we obtain the sum of squared
residuals from each of them and the ratio of their sum of squared residuals is formed. That is,
 Uˆ i2 = residuals form the sub-sample of low values of X 1 with [(n-c)/2] – k degrees of
freedom, where k is the total number of parameters in the model.

 Uˆ i2 = residual from the sub sample of high values of X, with the sample degree of freedom,
[(n-c)/2] – k
If each of these sums is divided by the appropriate degrees of freedom, we obtain estimates of
the variances of the Uˆ ' s in the two sub samples.
Step IV : Compute the ratio of the two variances given by
*
F =
 U    n  c  2  k    Uˆ
2
2
2
2
.........................................(4.7)
 Uˆ    n  c  2  k   Uˆ
1
2
1
2
has an F distribution (with numerator and denomenator each [{n-c-2k}/2] degrees of freedom,
where n = total number of observations, c = central observations omitted, k = number of
parameters estimated from each regression). If the two variances are the same (that is, if the
Uˆ ' s are homoscedasticc) the value of F * will tend to one. If the variance differ, F * will have a
large value (given that by the design of the test  Û 22 >  Û 12 . Generally, the observed F* is
compared with the theoretical value of F with (n-c-2k)/2 degrees of freedom (at a chosen level

of significance. The theoretical value of F (obtained from the F-tables) is the value of F that
defines the critical region of the test.
If F* > F we accept that there is hetroscedasticity (that is we reject the null hypothesis of no
difference between the variances of U’s in the two sub samples). If F * < F, we accept that the
U’s are homoscedastic (in other words we accept the null hypothesis). The higher the observed
F* ratio the stronger the hetrodcedasticity of the U’s.
Example:
Example: Suppose that we have data on consumption expenditure in relation to income for a
cross section of 30 families. Suppose we postulate that consumption expenditure is linearly
related to income but that hetoscedasticity is present in the data. Suppose further that the
middle 4 observations are dropped after the necessary reordering of the data. Suppose we
obtain the following result after we perform a separate regression based on the two 13
observations.
 1536.8 
F*
=
11 
 377.17 
 11 
F* = 4.07
Note from the F- table in the appendix that the critical F value for 11 numerator and 11
denominator df at the 5% level is 2.82. Since the estimated F* value exceeds the critical value,
we may conclude that there is hetroscedasticity in the error variance.
Note, however, that the ability of the Goldfeld-Quadent test to perform successfully depends on
how c is chosen. Moreover, its success depends on identifying the correct X (i.e., independent)
variable with which to order the observations. This limitation of this test can be avoided if we
consider the Breusch-Pagan –Godfrey (BPG) test.
V) Breusch-Pagan –Godfrey (BPG) test

This test is relevant for a very wide class of alternative hypotheses, normally that the variance
is some function of a linear combination of known variables. The generality of this test is both
its strength (that it does not require prior knowledge of the functional form involved).

To illustrate this test, consider the k-variable linear regression model
Yi = 0 + 1X1i + … + kXki + Ui ..........................................(4.8)
Assume that the error variance i2 is described as
i2 = f(
f(1 + 2Z2i + … + mZmi) ..........................................(4.9)
that is, i2 is some function of the non-stochastic variables Z’s. some or all of the X’s can serve
as Z’s. Specifically, assume that
i2 = 0 + 1Z1i + … + mZmi ..........................................(4.10)
that is, i2 is a linear function of the Z’s

If 1 = 2 = … = m = 0, i2 = 0 which is constant. Therefore to test whether i2 is
homoscedastic, one test the hypothesis that 1 = 2 = … = m = 0. This actual test procedure is
as follows.
Step1.Estimate Yi = 0 + 1X1i +…+ kXki + Ui by OLS and obtain the residuals Û 1 , Û 2 ,…,
Û n
Step2. Obtain ~ 2  Uˆ i2 n . Note that this is the maximum likelihood estimator of 2.
(Recall from unit two previous discussion that the OLS estimator i2 is  Uˆ i
2
(n  k )
Step3. Construct variables Pi defined as

Pi = Uˆ i2 ~ 2
Which is simply each residual squared divided by ~ 2
Step 4. Regress Pi thus constructed on Z’s as
Pi = 0 + 1Z1i + …
…mZmi + Vi
Where Vi is the residual term of this regression
Step 5. Obtain the ESS (explained sum of squares) from the above equation and define
1
= (ESS)
2
Assuming Ui are normally distributed, one can show that if there is homoscedasticity and if the
sample size n increases indefinitely, then
 ~ X2m-1
that is,  follows assymptoticaly the chi-square distribution with (m-1) degrees of freedom

Therefore, if in an application the computed  (= X2) exceeds the critical X2 value at the chosen
level of significance, one can reject the hypothesis of homoscedasticity; otherwise one does not
reject it.
Example:
Example: Suppose we have 30 observations data on Y and X that gave us the following
regression result.
Step 1 Ŷ = 9.29 + 0.64XI.

S.E
S.E = (5.2) (0.03)
RSS = 2361.15
Step 2 ~ 2  Uˆ i2 30 = 2361.15 = 78.71

30
Step 3 Pi = Uˆ i2 ~ 2 HAT IS DERIDE THE RESIDUALS Û OBTAINED FROM

REGRESSION IN STEP 1 BY 78.71 TO CONSTRUCT THE VARIABLE PI.
Step 4 Assuming that Pi are linearly related to Xi (=Zi) we obtain the following
regression result.
Pi= -0.74 + 0.01Xi
ESS = 10.42
Step 5  = ½ (ESS) = 5.21
From the Chi Square table we find that for 1 df the 5% critical Chi square value is 3.84. Thus,
the observed Chi square value is significant at 5% level of significance.
Note that BPG test is asymptotic. That is, it is a large sample test. The test is sensitive in small
samples with regard to the assumption that the disturbances Vi are normally distributed.
D) Remedial Measures – Solutions for Hetroscedasticic

Disturbances
As we have seen, hetroscedacticity does not destroy the unbiasedness and consistency properties of the OLS
estimators, but they are no longer efficient, not even asymptotically (i.e., large sample size). This lack of efficiency
makes the usual hypothesis testing procedure of dubious value. Therefore, remedial measures are clearly called
for. When hetroscedasticity is established on the basis of any test, the appropriate solution is to transform the
original model in such a way as to obtain a form in which the transformed disturbances term has constant variance.

We then may apply the method of classical least squares to the transformed model. The adjustment of the model
depends on the particular form of homoscedasticity. Note that the transformation is based on the assumption of the
form of hetroscedasticity plausible assumptions about hetrodcedasticity pattern
Assumption one:
one: Given the model Yi = 0 + 1Xi + Ui
Suppose that we assume the error variance is proportional to Xi2. That is,
E(Ui2) = 2Xi2
If, as a matter of “speculation”, or graphical methods it is believed that the variance of Ui is
proportional to the square of the explanatory variable X. For example suppose that graphical
inspection provides the following result
Figure 4.5 Ui is proportional to the square of the explanatory variable X
It is believed that the variance of Ui is proportional to the square of the explanatory variable X,
one may transform the original model as follows. Divide the original through by Xi to obtain
Yi  0 U
  1  i
Xi Xi Xi
 1 
= 0   + 1 + Vi ............................................... (4.11)
 Xi 
where Vi is the transformed disturbance term, equal to Ui/Xi. Now it is easy to verify that
2
U 
E(V )i
2
= E  i 
 Xi 
1
 2
E (U i2 )
Xi
Given this it can be concluded that

1
E (U i2 ) = 2
X i2
This is because by definition we assumed that

E(Ui2) = 2Xi2
So by substituting it on the above result we obtained the following
1
E(Vi2) =  
 2 X i2  2
X i2
Thus the variance of Vi is homoscedastic and one may proceed to apply OLS to the transformed
equation. Notice that in the transformed regression the intercept term 1 is the slope coefficient
in the original equation and the slope coefficient 0 is the intercept term in the original model.
Therefore, to get back to the original model we shall have to multiply the estimated (4.11) by Xi
Assumption two:
two: Given the model Yi = 0 + 1Xi + Ui suppose that we assume the error
variance to be proportional to Xi. That is,
E(Ui2) = 2Xi
This requires square root transformation.
For example if graphical inspection provides the following result, then it suggests that the
variance of Ui is proportional to Xi
Figure 4.6 Variance of Ui is proportional to Xi
In this case the original model can be transformed by dividing the model with Xi . That is,
Yi 0 X U
  i  i
Xi Xi Xi Xi

1
=  0 x  1 X i  Vi = Y* = 0* + 1*Xi + Vi ............................(4.12)
i
where Vi = U i X i and Xi > 0

Given assumption 2, one can readily verify that E(Vi2) = 2, a homoscedastic situation. That is
2
2
 U 
Var (Vi) = E(V ) = E  i
i 
 X i 
1
= E(Ui2)
Xi
Since by assumption we said E(Ui2) = 2Xi. It implies that

1
Var (Vi) = X 2Xi = 2
i
Therefore, one may proceed to apply OLS to the transformed equation. Note an important
feature of the transformed model: It has no intercept term. Therefore, one will have to use the
regression through the origin model to estimate 0 and 1. Having run regression on the
transformed model (4.12) one can get back to the original model simply by multiplying it with
Xi
Assumption three:
three: A log transformation such as
lnYi = 0 + 1lnXi + Ui
Very often such transformation reduces hetrodcedasticity when compared with the regression
Yi = 0 + 1Xi + Ui
This result arises because log transformation compresses the scales in which the variables are
measured. For example log transformation reduces a ten-fold difference between two values
(such as between 8 and 80) into a two-fold difference (because ln 80 = 4.32 and ln 8 = 2.08)
To conclude, the remedial measures explained earlier through transformation point out that we
are essentially speculating about the nature of i2. Note, also that the OLS estimators obtained
from the transformed equation are BLUE. Which of the transformation discussed will work will
depend on the nature of the problem and the severity of hetroscedasticity. Moreover, we may
not know a priori which of the X variable should be chosen for transformation the data in case
of multiple regression model. In addition log transformation is not applicable if some of the Y

and X values are zero or negative. Besides the use of t-test, F tests, etc are valid only in large
samples when regression is conducted in transformed variables.
1. State with brief reason whether the following statements are true, false, or uncertain
a) In the presence of hetroscedasticity OLS estimators are biased as well as
inefficient
b) If hetroscedasticity is present, the conventional t and F tests are invalid
2. State three consequences of hetroscedasticity
3. List and explain the BPG test
4. Suppose that you have data of personal saving and personal income of Ethiopia for 31
year period. Assume that graphical inspection suggest that Ui's are hetroscedasticso so
that you wanted to employ the Gordfield Quandt test. Suppose you ordered the
observation in ascending order of income and omit the nine central observations.
Applying OLS to each subset, you obtained the following result.
a) For sub set I
Ŝ1 = -738.84 + 0.008Ii
 Û 12 = 144,771.5
b) For Sub set II
Ŝ 2 = 1141.07 + 0.029I
 Û 22 = 769,899.2
Is there any evidence of hetroscedasticity?
4.4 AUTOCORRELATION
A. The Nature of Autocorrelation
An important assumption of the classical linear model is that there is no autocorrelation or
serial correlation among the disturbances Ui entering into the population regression function.
This assumption implies that the covariance of Ui and Uj in equal to zero. That is:
Cov(UiUj) = E{[Ui – E(Ui)] [Uj – E (Uj)]

= E(UiUj) = 0 (for i  j)
But if this assumption is violated, it implies that the disturbances are said to be auto correlated.
This could arise for several reasons.
i) Spatial autocorrelation:
autocorrelation: In regional cross-section data, a random shock affecting
economic activity in one region may cause economic activity in an adjacent region to
change because of close economic ties between the regions. Shocks due to weather
similarities might also tend to cause the error terms between adjacent regions to be related.
ii) Prolonged influence of shocks:
shocks: In time series data, random shocks (disturbances) have
effects that often persist over more than one time period. An earth quick, flood, strike or
war, for example, will probably affect the economy’s operation in periods.
iii) Inertia: past action often have a strong effect on current actions, so that a positive
disturbance in one period is likely to influence activity in succeeding periods.
iv) Data manipulation published data often undergo interpolation or smoothing, procedures
that average true disturbances over successive time periods.
v) Misspecification: An omitted relevant independent variable that is auto correlated will
make the disturbance (associated with the misspecified model) auto correlated. An
incorrect functional form or a misspecification of the equation’s dynamics could do the
same. In these instances the appropriate procedure is to correct the misspecification.
Note that autocorrelation is a special case of correlation. Autocorrelation refers to the

relationship not between two (or more) different variables, but between the successive values of
the same variable (where in this section we are particularly interested in the autocorrelation of
the U’s. Moreover, note that the term autocorrelation and serial correlation are treated
synonymously.

Figure 4.7 patterns of autocorrelation
Since auto correlated errors arise most frequently in time series models, the discussion in the
rest of this unit is couched in terms of time series data.
There are a number of time-series patterns or process that can be used to model correlated
errors. The most common is what is known as “ the first order autoregressive process or AR(1)
process. Consider
Yt = 0 + 1Xt + Ut
where t denotes data or observation at time t (i.e., a time series data) with this one can assume that the disturbances
are generated as follows
Ut = Ut-1 + t
Where  is known as the coefficient of auto covariance and where t is the stochastic such that
it satisfies the standard OLS assumptions, namely
E(
E(t) = 0
Var(t) = 2
Var(
Cov (
(t, t+s) = 0
where subscript ‘s’ represent the exact period of lag.

The above specification is of first order because the regression of U t is on itself lagged one
period (where the coefficient  is the first order coefficient of autocorrelation) Note that the
above specification postulates that the movement or shift in Ut consists of two parts: a part
Ut-1, which accounts for systematic shift, and the other t which is purely random.
Relationships between Ut’s
Cov (Ut, Ut-1) = E[(Ut – E(Ut) (Ut-1 – E(Ut-1)]
= E[Ut Ut-1]
by substituting Ut = Ut-1 + t we obtain:
E[(
E[(Ut-1 + t) Ut-1]
= E[U2t-1] + E[
E[t Ut-1]
Note that E(
E(t) = 0 thus E(
E(t Ut-1) = 0
Since with the assumption of homoscedasticity (i.e., constant variance) Var(U t) = Var (Ut-1) =
u2 the result would be
Cov (Ut, Ut-1) = u2
Now, correlation of Ut, Ut-1 is given by (recall what we have discussed in the course statistics
for economists):
Cov (U t ,U t  1 )  u2
Corr (Ut, Ut-1) = 
Var (U t ) Var (U t  1 ) Var (U t )
 u2
= 2 
u
where -1 <  < 1
Hence, (rho) is simple correlation of the successive errors of the original model.
Note that when  > 0 successive errors are positively correlated and when  < 0 successive
errors are negatively correlated. It can be shown that corr(U t, Ut-s) = s (where s represents the
exact period of lag). It implies that the correlation (be it negative or positive) between any two
period diminishes as time goes by; i.e., as s increases
b) Consequences of Autocorrelation

When the disturbance term exhibits serial correlation the value as well as the standard errors of
the parameter estimates are affected.

i) If disturbances are correlated, the prevaild value of the disturbances have some information
to convey about the current disturbances. If this information is ignored it is clear that the
sample data is not being used with maximum efficiency. However the estimates of the
parameters do not have the statistical biase even when the residuals are serially correlated.
That is, the parameter of OLS estimates are statistically unbiased in the sense that their
expected value is equal to the true parameter.
ii) The variance of the random term U may be seriously underestimated. In particular, the
under estimation of the variance of U will be more serious in the case of positive
autocorrelation of the error term (U t). With positive first-order auto correlated errors it
implies that fitting an OLS estimating line clearly gives an estimate quite wide of the mark.
The high variation in these estimates will cause the variance of OLS to be greater than it
would have been had the errors been distributed randomly. The following figure illustrates
positive autocorrelated errors
Figure 4.8 Positive autocorrelated errors
Notice from the diagram that the OLS estimating line gives a better fit to the data than the true
relationship. This reveals why in this contest r2 is overestimated and u2 (and the variance of
OLS) is under estimated. When the standard error of ̂ ' s are biased down wards, it leads to
confidence intervals which are much narrow. Moreover, parameter estimate of irrelevant
explanatory variable may be highly significant. In other words, the figure reveals that the
estimated error term Û i are closer to the regression line than are the U’s to the true line and
thus we would have a serious underestimation of u2

iii) The prediction based on ordinary least squares estimate will be inefficient with
outocorrelated errors. This is because of having a larger variance as compared with predictions
based on estimates obtained from other econometric techniques. Recall that the variance of the
forecast depends on the variances of the coefficient estimates and the variance of U. Since these
variances are not minimal as compared with other techniques, the standard error of the forecast
(from OLS) will not have the least value, due to autocorrelated U’s.
c) Testing (Detecting) for Autocorrelation

Autocorrelation is potentially a series problem. Hence, it is essential to find out whether
autocorrelation exists in a given situation. Consider the following commonly used tests of serial
correlation.
Note that since the population disturbances Ut, cannot be observed directly, we use its proxy,
the residual Û t which can be obtained form the usual OLS procedure. The examination of Û t
can provide useful information not only about autocorrelation but also about hetrescedasticity,
model inadequacy, or specification bias.
i) Graphical Method
Some rough idea about the existence of autocorrelation may be gained by plotting the residuals
either against time or against their own lagged variables.
For instance, suppose plotting the residual against its lagged variable bring about the following
relationship.
Û t
Uˆ
t 1
Figure 4.9 Û t and Uˆ t  1
As the above figure reveals most of the residuals are bunched in the first and the third
quadrants suggesting very strongly that there is positive correlation in the residuals. However,
the graphical method we have just discussed is essentially subjective or qualitative in nature.
But there are quantitative tests that can be used to supplement the purely qualitative approach
ii) Durbin-Watson d Test

The most celebrated test for detecting serial correlation is the one developed by statisticians
Durbin and Watson. It is popularly known as the Durbin-Watson d-Statistic which is defined as
n 2
 Uˆ
t 2
t  U t 1 
d= n ............................................(4.13)
 Uˆ t2
t 2
which is simply the ratio of the sum of squared differences in successive residuals to the
residual sum of squares, RSS. Note that in the numerator of the d statistic the number of
observations is n-1 because one observation is lost in taking successive differences. Note that
expanding the above formula allows us to obtain
d = 2(1 - ̂ ). ......................................................... (4.14)
Although it is not used routinely, it is important to note the assumptions underlying the d-statistics
a) the regression model includes an intercept term
b) the explanatory variables are non-stochastic or fixed in repeated sampling
c) the disturbances Ut are generated by the first order autoregressive scheme.
Ut = Ut-1 + t
d) the regression model does not include lagged value(s) of the dependent variable as one
of the explanatory variables
e) there are no missing observations in the data
Note from the Durbin-Watson statistic that for positive autocorrelation (
( > 0), successive
disturbance values will tend to have the same sign and the quantities (U t – Ut-1)2 will tend to be
small relative to the squares of the actual values of the disturbances. We can therefore, expect
the value of the expression in equation (4.13) to be low. Indeed, for the extreme case  = 1 it is
possible that Ut = Ut-1 for all t so that the minimum possible value of the equation is zero.
However, for negative autocorrelation, since positive disturbance values now tend to be
followed by negative ones and vise versa, the quantities (Ut – Ut-1)2 will tend to be large relative
to the squares of the U’s. Hence, the value of (4.13) now tends to be high. The extreme case
here is when  = 0 we should expect the expression (4.14) to take a value in the neighborhood

of 2. Notice, however, that when  = 0, the equation reduces to Ut = t for all t, so that t takes
on all the property of Ut – in particular it is no longer autocorrelated. Thus in the absence of the
autocorrelation we can expect equation (4.14) to take a value close to 2, when negative
autocorrelation is present a value in excess of 2 and may be as high as 4, and when positive
autocorrelation is present a value lower than 2 and may be close to zero.
The Durbin-Watson test tests the hypothesis that H 0:  = 0 (implying that the error terms are not
autocorrelated with a first order scheme) against the alternate. However, the sampling
distribution for the d-statistic depends on the sample size n, the number of explanatory
variables k and also on the actual sample values of the explanatory variables. Thus, the critical
values at which we might, for example reject the null hypothesis at 5 percent level of
significance depend very much on the sample we have chosen. Notice that it is impracticable
to tabulate critical values for all possible sets of sample values. What is possible however, is for
given values of n and k, to find upper and lower bounds such that actual critical values for any
set of sample values will fall within these known limits. Tables are available which give these
upper and lower bounds for various levels of n and k and for specified levels of significance.(In
the appendices part you can get the Durbin Watson table)
The Durbin-Watson test procedure in testing the null hypothesis of  = 0 against the alternative
hypothesis of positive autocorrelation is illustrated in the figure below.
Note that under the null hypothesis the actual sampling distribution of d, for the given n and k
and for the given sample X values is shown by the unbroken curve. It is such that 5 percent of
the area beneath it lies to the left of the point d *, i.e., P(d < d*) = 0.05. If d* were known we
would reject the null hypothesis at the 5 percent level of significance if for our sample d < d *.
Unfortunately, for the reason given above, d* is unknown. The broken curve labeled DL and du
represent for given values of n and k, the upper and lower limits to the sampling distribution of
d with in which the actual sampling distribution must lie whatever the sample x-values.
du
d
dL

d*L d* d*u 4
Figure 4.10 Distribution of dL and dU
The point d*U and d*L are such that the areas under the respective d u and dL curves to the left of
these points are in each case 5 percent of the total area. i.e., p(d L < d*L) = p(dU < d*U) = 0.05. It
is the point d*U and d*L, representing the upper and lower bounds to the unknown d *, that are
tabulated for varying values of n and k. Clearly, if the sample value of the Durbin-Watson
statistic lies to the left of d *L it must also lie to the left of d *, while if it lies to the right of d *U . it
must also lie to the right of d*. However, there is an inconclusive region, since if d lies
between d*L and d*U we cannot know whether it lies to the left or right of d*
The decision criterion for the Durbin-Watson test is therefore, of the following form
- for d < d*L reject the null hypothesis of no autocorrelation in favor of positive
autocorrelation;
- for d > d*U do not reject null hypothesis, i.e., insufficient evidence to suggest positive
autocorrelation;
- for d*L < d < d*U test inconclusive.
Because of the symmetry of the distribution illustrated in the previous figure it is also possible
to use the tables for d*L and d*U to test the null hypothesis of no autocorrelation against the
alternative hypothesis of negative autocorrelation, i.e.  < 0. The decision criterion then takes
the form.
- for d > 4 - d*L reject the null hypothesis of no autocorrelation in favor of negative
autocorrelation.
- for d < 4 - d*U do not reject null hypothesis, i.e., insufficient evidence to suggest negative
autocorrelation
- for 4 - d*L > d > 4- d*U test inconclusive.
Note that tables for d*U and d*L are constructed to facilitate the use of one-tail rather than two
tail tests. The following representation explains better the actual test procedure which shows
that the limit of d are 0 and 4.
Note:
Note:
H0: No positive autocorrelation
HIntroduction
*
to Econometrics
0 : No Negative autocorrelation Page 118
Reject H0 Zone of Zone of Reject H0
Eviden indecision indecision Eviden
Do not reject H0 or H*
ce of or both ce of
d
0 positive dL dU 2 4-dU 4-dL negativ 4
Note that from the above presentation we can develop the following rule of thumb. That is, if d
is found to be closer to 2 in an application, one may assume that there is no first order
autocorrelation either positive or negative. If d is closer to 0 it is because  is closer to 1
indicating strong positive autocorrelation in the residuals. Similarly the closer d is to 4, the
greater the evidence of negative serial correlation. This is because  is closer to –1.
Example: Suppose in a regression involving 50 observations 4

regressors the estimated d was 1.43. From the Durbin Watson table we
find that at the 5% level the critical d value are d L = 1.38 and dU = 1.72
(the reader should check this by refering the Durbin Watson table
attached in the appendix). Note that on the basis of the d test we can not
say whether there is positive autocorrelation or not because the estimated
d value lies in the indecisive range
d) Remedial Measure
Since in the presence of serial correlation the OLS estimators are inefficient, it is essential to
seek remedial measure.
If the source of the problem is suspected to be due to omission of important variables, the
solution is to include those omitted variables. Besides if the source of the problem is believed
to be the result of misspecification of the model, then the solution is to determine the
appropriate mathematical form.
If the above approaches are ruled out, the appropriate procedure will be to transform the
original data so that we can comeup with a new form (or model) which satisfies the assumption
of no serial correlation. Of course, the transformation depends on the nature of the serial

correlation. If the nature of serial correlation is assumed to follow the first-order autoregressive
scheme, namely,
Ut = Ut-1 + t I . ....................................................(4.15)
In this case the serial correlation problem can be satisfactorily resolved if , the coefficient of
autocorrelation, is known.
Consider the following two variable model
Yt = 0 + 1Xt + Ut .......................................................(4.16)
For time t-1 the above model will be
Yt-1 = 0 + 1Xt-1 + Ut-1 .......................................................(4.17)
multiplying both sides by , we obtain
Yt-1 = 0 + 1Xt-1 + Ut-1 .....................................................(4.18)
Subtracting (4.19) from (4.17) gives
(Yt - Yt-1) = (0 - 0) + (1Xt - 1Xt-1) + Ut - Ut-1
= 0 (1-) + 1(Xt -Xt-1) + t ................................(4.19)
The transformed model can be expressed as
Y*t = *0 + *1X*t + t ................................(4.20)
Where Y*t = Yt - Yt-1, *0 = 0 - 0 and X*t = (Xt - Xt-1)
Since t (which is Ut - Ut-1 and can be derived from [4.15]) satisfies the OLS assumptions,
one can proceed to apply OLS to the transformed variable Y* and X* and obtain estimators with
all the optimum properties, namely BLUE. Running regression on (4.19- the above transformed
model) is tantamount to using generalized least squares (GLS).
Note that although the procedure discussed earlier is straight forward to apply, it is generally
difficult to run because , population correlation coefficient is rarely known in practice.
Therefore, alternative methods need to be devised.
The Cochrane-Orcutt interative Procedure

This procedure helps to estimate  from the estimated residuals Û t so that information about
the unknown  will be derived.
To explain the method, consider the two-variable model

Yi = 0 + 1Xi + Ui ............................................. (4.21)
and assume that Ut is generated by the AR(1) scheme namely
Ut = Ut-1 + t ............................................ (4.22)
Cochrance and Orcutt then recommended the following steps to estimate :
Step 1:
1: Estimate the two variable model by the standard OLS routine and obtain the residuals
Û t
Step 2:
2: Using the estimated residuals, run the following regression
Û t = ̂Uˆ t  1  Vt
Step 3:
3: Using ̂ obtained from step 2 regression, run the generalized difference equation
similar to (4.20) as follows
(Yt - ̂ Yt-1) = 0 (1- ̂ ) + 1(Xt - ̂ Xt-1) + (Ut - ̂ Ut-1)
or Y*t = *0 + *1X*t + Uˆ t*
Step 4:
4: Since a priori it is not known that the ̂ obtained from the regression in step 2 is the
best estimate of , substitute the values of ̂ 0* and ̂ 1* obtained from the regression in
step 3 into the original regression (4.21) and obtain the new residuals, say Uˆ t** as
Uˆ t** = Yt - ̂ 0* - ̂ 1* Xt
Note that this can be easily computed since Yt, Xt, ̂ 0* and ̂ 2* are all known.
Step 5:
5: Now estimate this regression
ˆÛ
Uˆ t** =  ˆ ** + Wt
t 1
Where ̂ˆ is the second round estimate of 

Since we do not know whether this second round estimate of is the best estimate of , we can go into the third
estimate, and so on. That is why the Cochrane-Orcutt method is said iterative. But how long should we go on? The
general procedure is to stop carrying out iterations when the successive estimates of  converges (i.e., differ by
very small amount). Thus we select that choosen  to transform the model and apply a kind of GLS estimation that
minimizes the problem of autocorrelation.
1. State whether the following statements are true or false. Briefly justify your answer
a) when autocorrelation is present, OLS estimators are biased as well as inefficient
b) in the presence of autocorrelation, the conventionally computed variances and standard errors of the
forecast values are inefficient
2. Given a sample of 50 observation and four explanatory variables what can you say about autocorrelation if
i) d = 1.05 ii) d = 1.40
3. Suppose Yi = 0 + 1Xi + U i. Assume that U i is generated by the AR(I)scheme. Show the
Cochrane- Orcutt procedure to testing autocorrelation
4. Suppose that a research used a 20 years data on imports and GDP of Ethiopia.
Applying OLS to the observations she obtained the following import function.

M = -2461 + 0.28G
 Uˆ t2 = 573, 069
2
 U t  U t  1  = 537,192
Where M = import, G= GDP

Use the Durban Watson test to examine the problem of autocorrelation.
4.5 MULTICOLLINEARITY
a) The nature of the problem
One of the assumption of the classical linear regression model (CLRM) is that there is no perfect multicollinearity
among the regressors included in the regression model. Note that although the assumption is said to be violated
only in the case of exact multicollinearity (i.e., an exact linear relationship among some of the regressors), the
presence of multicollinearity (an approximate linear relationship among some of the regressors) lead to estimating
problems important enough to warrant out treating it as a violation of the classical linear regression model.
Multicollinearity does not depend on any theoretical or actual linear relationship among any of
the regressors; it depends on the existence of an approximate linear relationship in the data set
at hand. Unlike most other estimating problems, this problem is caused by the particular sample
available. Multicollinearity in the data could arise for several reasons. For example, the
independent variables may all share a common time trend, one independent variable might be
the lagged value of another that follows a trend, some independent variable may have varied
together because the data were not collected from a wide enough base, or there could in fact
exist some kind of approximate relationship among some of the regressors.
Note that the existence of multicollinearity will affect seriously the parameter estimates.
Intuitively, when any two explanatory variables are changing in nearly the same way, it
becomes extremely difficult to establish the influence of each one regressors on the dependent
variable separately. That is, if two explanatory variables change by the same proportion, the
influence on the dependent variable by one of the explanatory variables may be erroneously
attributed to the other. Their effect cannot be sensibly investigated, due to the high inter
correlation.
In general, the problem of multicollinearity arises when individual effects of explanatory

variables cannot be isolated and the corresponding parameter magnitudes cannot be determined

with the desired degree of precision. Though it is quite frequent in cross section data as well, it
should be noted that it tends to be more common and more serious problem in time series data.
b) Consequences of Multicollinearity
In the case of near or high multicollinearity, one is likely to encounter the following
consequences
i) Although BLUE, the OLS estimators have large variances and covariances, making
precise estimation difficult. This is clearly seen through the formula of variance of the
estimators. For example of multiple linear regression, Var( ̂ 1 ) can be written as follows
2
Var( ̂ 1 ) =
 x12i 1  r122 
It is apparent from the above formula that as r12 (which is the coefficient of correlation
between X1 and X2) tends towards 1, that is as collinearity increases, the variance of the
estimator increases. The same holds for Var( ˆ 2 )and the cov ( ̂ 1 , ˆ 2 )
ii) Because of consequence (i), the confidence interval tend to be much wider, leading to the
acceptance of the “Zero null hypothesis” (i.e., the true population coefficient is zero).
iii) Because of consequence (i), the t-ratio of one or more coefficient’s tend to be statistically
insignificant.
iv) Although the t-ratio of one or more coefficients is statistically insignificant, R 2, the overall
measure of goodness of fit, can be very high. This is the basic symptom of the problem.
v) The OLS estimators and their standard errors can be sensitive to small changes in the data.
That is when few observations are included, the pattern of relationship may change and
affect the result.
vi) Forecasting is still possible if the nature of the collinearity remains the same within the
new (future) sample observation. That is, if collinearity exists on the data of the past 15
years sample, and if collinearity is expected to be the same for the future sample period,
then forecasting will not be a problem.

c) Detecting Multicollinearity
Note that multicollinearity is a question of degree and not of a kind. The meaningful distinction
is not between the presence of multicolinearity, but between its various degrees.
Multicollinearity is a feature of the sample and not of the population. Therefore, we do not “test
for multicollinearity” but can, if we wish, measure its degree in any particular sample. The
following are some rules of thumb and formal rules to detection of multicolinearity.
i) High R2 but few significant t-ratios: If R 2 is high, say in excess of 0.8, the F-test in most
cases will reject the hypothesis that the partial slope coefficients are simultaneiously equal
to zero. But the individual t tests will show that none of very few of the partial slope
coefficients are statistically different from zero.
ii) High pair-wise correlation among regressors. If the pair-wise correlation coefficient
among two regressors is high, say in excess of 0.8, then multicolinearity is a serious
problem.
iii) Auxiliary Regression: - Since multicollinearity arises because one or more of the
regressors are exact or approximately linear combinations of the other regressors, one way
of finding out which X variable is related to other X variables is to regress each Xi on the
remaining X variables and compute the corresponding R2that will help to decide abut the
problem. For example, consider the following auxiliary regression :
Xk = 1X1 + 2X2 + … + k-1Xk-1 + V
If the R2 of the above regression is high it implies that Xk is highly correlated with the rest
of the explanatory variables and hence drop Xk from the model.
d) Remedial Measures
The existence of multicolinearity in a data set doesnot necessarily mean that the coefficient
estimators in which the researcher is interested have unacceptably high variance. Thus, the
econometrician should not worry about multicollinearity if the R 2 from the regression exceeds
the R2 of any independent variable regressed on the other independent variables”. Moreover the
researcher should worry about multicollinearity if the t-statistics are all greater than 2. Because
multicollinearity is essentially a sample problem there are no infallible guides. However one

can try the following rules of thumb, the success depending on the severity of the collinearity
problem.
a) Obtain more data:
data: - Because the multicollinearity is essentially a data problem, additional
data that do not contain the multicollinearity feature could solve the problem. For
example, in the three variable model we saw that
2
Var( ̂ 1 ) = 2
x 1i (1  r122 )
Now as the sample size increases, x1i2 will generally increases. Thus for any given r 12, the
variance of ̂ 1 will decrease, thus decreasing the standard error, which will enable us to
estimate 1 more precisely.
b) Drop a variable:
variable: - when faced with severe multicollinearity, one of the “simplest” thing to
do is to drop one of the collinear variables. But note that in dropping a variable from the
model we may be committing a specification bias or specification error. Specification bias
arises from incorrect specification of the model used in the analysis. Thus, if economic
theory requires some variables to be included in the model, dropping one of the variables
due to multicollinearity problem would constitute specification bias. This is because we
are dropping a variable when its true coefficient in the equation being estimated is not
zero.
c) Transformation of variables:
variables: - In time series analysis, one reason for high
multicollinearity between two variables is that over time both variables tend to move in
the same direction. One way of minimizing then dependence is to transform the variables.
That is, suppose Yt = 0 + 1X1t + 2X2t.
This relation must also hold at time t-1 because the origin of time is arbitrary anyway.
Therefore we have
Yt-1 = 0 + 1X1t-1 + 2X2t-1 + Ut-1.
Subtracting this from the above gives
Yt – Yt-1 = 1(X1t – X1t-1) + 2(X2t – X2t-1) + Vt
This is known as the first difference form because we run the regression, not on the original
variables, but on the difference of successive values of the variables. The first difference
regression model often reduces the severity of multicollinearity because, although the levels of

X1 and X2 may be highly correlated, there is no a priori reason to believe that their difference
will also be highly correlated
d) Formalize relationships among regressors:

regressors: - If it is believed that the multicollinearity
arises not from an unfortunate data set but from an actual approximate linear relationsip
among some of the regressors, this relationship could be formalized and the estimation
could then proceed in the context of a simultaneous equation estimation problem.
Check your progress 3
1. State with reasons whether the following statements are true, false or uncertain
a) Despite perfect multicollinearity, OLS estimators are BLUE
b) If an auxiliary regression shows that a particular R 2 is high, there is definite
evidence of high collinearity.
2. In data involving economic time series such as GDP, income, prices, unemployment, etc.
multicollinearity is usually suspected. Why?
3. State three remedial measure if multicollinearity is detected
4.6 SUMMARY
- In the presence of hetroscedasticity, the variance of OLS estimators are not provided by the
usual OLS formulas. But if we persist in using the usual OLS formula, the t and F tests
based on them can be highly misleading, resulting in erroneous conclusions.
- Autocorrelation can arise for several reasons that make OLS estimators to be inefficient.
The remedy depends on the nature of the interdependence among the disturbances, Ut
- Multicollinearity is a question of degree and not of a kind. Although there are no sure
methods of detecting collinearity, there are several indicators of it.
4.7 ANSWERS TO CHECK YOUR PROGRESS
1 a) False. Because though OLS estimates are inefficient with the presence of
hetroscedasticity, they would still be statistically unbiased.

b) True, because OLS estimated do not have the minimum variance.
The answer for question number 2 and 3 is discussed in the text
 Uˆ
2
2 769,899.2
4. F* = 2
5
 Uˆ 1
144,771.6
The theoretical (table) value of F at the 5 percent level of significance (where n=

31, C= 9 and k= 2 so that 9 df) is 3.18.
Given that F*>F0.05, we reject the assumption of homoscedasticity.
1 a) False, because OLS estimates are unbiased.

b) True (see the explanation in the text)
2 a) Note that for n= 50 and k = 4 dL is 1.38
since d = 1.05 is less than dL. It suggests the existence of positive autocorrelation.
b) dL = 1.38, d0 = 1.72
Since 1.38 < d= 1.40 < 1.72; it lies in the indecisive range.
3. The answer is discussed in the text.
 (Uˆ  Uˆ
t t 1 )2 537,192
4. d* = = 573,069 0.937
 Uˆ t
2
From the Durbin - Watson table, with 5 percent level of significance, n = 20 and K=1, we find
that d2 = 1.20 and du = 1.41. Since d* = 0.937 is less that d2 = 1.20, we conclude that there is
positive autocorrelation in the import function.
Answer To Check Your Progress 3
1. a) True (refer the text for the explanation)

b) True (refer the text for the explanation)
2. This is because the variables are highly interrelated. For example, an increase in income
brings about an increase in GDP. Moreover, an increase in unemployment usually brings
about a decline in prices.
3. Refer the text for the answer

1. True or false and explain when necessary

a) In the presence of hetroscedasticity the usual OLS method always overestimates the
standard errors of estimators.
b) If a regression model is mis-specified the OLS residuals will show a distinct pattern.
c) The Durbin Watson d test assumes that the variance of the error term Ut is
homoscedastic.
d) In case of high multi co linearity, it is not possible to assess the individual
significance of one or more partial regression coefficients.
2. Consider the following model
Yt = 0 + 1 Xt + 2Xt-1+ 3Xt-2 + 4Xt-3 + Ut
WHERE Y = CONSUMPTION, X = INCOME, T = TIME. NOTE THAT THE

MODEL IMPLIES THAT CONSUMPTION EXPENDITURE AT TIME T IS A
FUNCTION OF, CURRENT INCOME, XT AND PREVIOUS PERIODS' INCOME.
a) Would you expect multi-co linearity in such model.
b) If co linearity is expected, how would you resolve the problem
3. You are given the following data.

 Û 12 based on the first 30 observations = 55, df =25.
 Û 22 based on the last 30 observations = 140, df = 25
Carrying out the Goldfeld Quandt test of hetroscedasticity at the 5% level of significance.
4. Given a sample of 50 observations and 4 explanatory variables, what can

you say about autocorrelation
a) d = 2.50 b) 3.97
5) Answer the following

a) Discuss the causes of hetroscedasticity
b) State the consequences of autocorrelation
c) Explain 3 remedial measures suggested to overcome multi-co linearity.

UNIT 5: FURTHER TOPICS IN REGRESSION
Content
5.1 Introduction
5.2 Models with Binary Regressors
5.3 Non-Linear Regression Models
5.3.1 Non-Linear Relationships in Economics
5.3.2 Specification and Estimation of Non-Linear Models
5.3.2.1 Polynomials
5.3.2.2 Log-log Models
5.3.2.3 Semi-log Models
5.3.2.4 Reciprocal Model
5.4 Summary
5.6 References
This unit aims at introducing models with binary explanatory variable(s) and specification and
estimation of non-linear models.
In this unit you will be able to:

 familiarize yourself with regression models with binary explanatory variables
 understand some of the non-linear relationships in economics
 specify and estimate the non-linear models
 make inference based on some non-linear models
5.1 INTRODUCTION
As it is mentioned in the previous section, this unit is dealing with the role of qualitative
explanatory variables in regression analysis and the functional forms of some non-linear
regressor models. It will be shown that the introduction of qualitative variables, often called

dummy variables, makes the linear regression model an extremely flexible tool that is capable
of handling many interesting problems encountered in empirical studies. Having a brief
introduction on such binary variables the functional forms of regression models (i.e., regression
models that may be non linear in the variables but are liner in the parameters) will be discussed.
Double log, semi-log and reciprocal models will be shown. We will see their special features
and functional forms.
5.2 MODELS WITH BINARY REGRESSORS
5.2.1 The Nature of Dummy Variables

In regression analysis it frequently happens that the dependent variable is influenced, not only
by variables which can be readily quantified on some well defined scale (eg. income, output,
price, etc) but also by variables which are essentially qualitative in nature. For example, color,
sex, race, religion, change in policy, nationality etc are all dummy variables. Since such kind of
variables may have an influence on the dependent variable they should be included in the
model. How can one include such variables as an explanatory variable in the model?
Since such qualitative variables usually indicate the presence or absence of an ‘attribute’ (ex,
male or female, black or white etc) one method of quantifying such attributes is by constructing
artificial variables which take on values of 1 or 0, 0 indicating the absence of an attribute and 1
indicating the presence of that attribute.
Example.
Example. If an individual is male = 1
female = 0
Variables which assume such 0 and 1 values are called dummy variables or binary variables or
qualitative variables or categorical variables or dichotomous variables.
Now let us take some examples with a single quantitative explanatory variable and two or more
qualitative explanatory variables.
Example 1:
1: Suppose a researcher wants to find out whether sex makes any difference in a
college teacher’s salary, assuming that all other variables such as age, education level,
experience etc are held constant.

The model can be formulated as follows
Yi =  0 +  1 Di + Ui
Where Y = annual salary of a college teacher
Di = 1 if male college teacher
= 0 otherwise (i.e., female teacher)
The above model is like the two-variable regression models discussed in unit 2. The difference
is instead of a quantitative variable X we have a dummy variable D. (The disturbance term
satisfy all the assumptions of the CLRM)
Mean salary of female college teacher E(Yi/Di = 0) =  0
Mean salary of male college teacher E(Yi/Di = 1) =  0 +  1
The intercept term  0 gives mean salary of female college teachers and the slope coefficient
 1 tells by how much the mean salary of a male college teacher differs from the mean salary of
his female counterpart.
Consider the following hypothetical data on satisfying salaries of college teachers by sex
Starting salary Sex
(Y) (1 = male, 0 = female)
22,000 1
19,000 0
18,000 0
21,700 1
18,500 0
21,000 1
20,500 1
17,000 0
17,500 0
21,200 1

The results of regression analysis are presented as follows:
Yî = 18,000 + 3,280Di

(0.32) (0.44)
t = (57.74) (7.439)
R2 = 0.8737
The above results shows that the estimated mean salary of female college teachers is birr
  
18,000 (=  0 ) and that of male teachers is birr 21,280 (  0 +  1 )

Since  1 is statistically significant, the results indicate that the mean salaries of the two
categories are different, actually the female teacher’s average salary is lower than her male
counter part. If all other variables are held constant, there is sex discrimination in the salaries of
the two sexes.
The above regression can be shown graphically below:
Salary
 
(0+1)
= 21,800
= 3,280

 0 = 18,000
Figure 5.1: Female and male teacher’s salary functions
Example 2:
2: Let us consider a regression with quantitative and qualitative explanatory
variables. Let us include one quantitative explanatory variable on the model given in example 1
above
Yi =  0 +  1 Di +  2 Xi + Ui
where Yi = annual salary of college teacher

Xi = years of teaching experience
Di = 1 if male

= 0 otherwise
The female teacher is known as the base category since it is assigned the value of 0.
Note that the assignment of 1 and 0 values to two categories, such as male and female, is
arbitrary in the sense that in our example we could have assigned D = 1 for female and D = 0
for male. But in interpreting the results of the models which use the dummy variables it is
critical to know how the 1 and 0 values are assigned.
The coefficient  0 (intercept) is the intercept term for the base category. The coefficient  1
attached to the dummy variable D can be called the differential intercept coefficient because it
tells by how much the value of the intercept term of the category that receives the value of 1
differs from the intercept coefficient of the base category.
The other important point is on the number of dummy variables to be included in the model. If
a qualitative variable has m categories, introduce only m–1 dummy variables. In the above
examples, sex has two categories, and hence we introduced only a single dummy variable. If
this rule is not followed, we shall fall in to what might be called the dummy-variable trap, that
is, the situation of perfect multicollinearity.
Example 3:
3: Let us take an example on regression on one quantitative variable and one
qualitative variable with more than two classes. Suppose we want to regress the annual
expenditure on health care by an individual on the income and education of the individual. Now
the variable education is qualitative in nature. We can have, as an example, three mutually
exclusive levels of education.
- Less than high school
- High school
- College
The number of dummies = 3 – 1 = 2. (Note the rule)
Let us consider the “less than high school education” category as the base category. The model
can be formulated as follows:
Yi =  0 +  1 D1i +  2 D2i +  3 Xi + Ui
where Yi = annual expenditure on health care

Xi = annual income
D1 = 1 if high school education
= 0 otherwise
D2 = 1 if college education
= 0 otherwise
Assuming E(Ui) = 0, the mean health care expenditure functions for the three levels of
education are:
E(Yi/D1 = 0, D2 = 0, Xi) =  0 +  3 Xi , for less than high school education
E(Yi/D1 = 1, D2 = 0, Xi) = (  0 +  1 ) +  3 Xi, for high school education
E(Yi/D1 = 0, D2 = 1, Xi) = (  0 +  2 )+  3 Xi , for college education
College education
High school education
Less than high school (Base category)
education
2
1
0
X (income)
Figure 5.2: Expenditure on health care in relation to income for three levels of education
The intercept  0 is the intercept of the base category. The differential intercepts  1 and  2
tells by how much the intercepts of the other two categories differ from the intercept of the base
category.
The technique of dummy variable can be easily extended to handle more than one qualitative
variable. If you consider example 1 above it is possible to introduce another dummy variable,
for example, color of the teacher, as an explanatory variable. Hence we will have an additional
dummy variable for color i.e.
D2 = 1 if white and 0 otherwise
Therefore, it is possible to include more than one quantitative variable and more than two
qualitative variable in our linear regression model.

Check Your Progress 5.2.1
1. Suppose that the salary of economics graduate students depends on their degree
qualification (whether a candidate has a Ph.D degree or not).
a) Specify the model with salary as the dependent variable and degree qualification as
the explanatory variable
b) Find E(Yi/Xi = 0) and E(Yi/Xi = 1)
2. Suppose that a researcher wants to regress the annual salaries of economics graduates on
the number of years of experience and education level of the graduates (Here we have
three levels of education, namely. BA, MSC and Ph.D)
a) How many dummy variables will be included in the model
b) Specify the model considering BA as the base category
c) Find the mean values of the annual salaries corresponding to different values of the
regressors.
5.3 NON-LINEAR REGRESSION MODELS
The purpose of this section is to introduce you with models that are linear in the parameters but
non linear in the variables.
5.3.1 Non Linear Relationships in Economics
The assumption of linear relationship between the dependent and the explanatory variables may
not be acceptable for many economic relationships. Given the complexity of the real world we
expect non-linearities in most economic relationships.
Example 1:
1: Cost functions are usually non-linear
ATC ATC

Quantity
of output
Example 2:
2: Production functions
Product
(Y) TP
Input
Fig 5.3. Examples of non linear relationships in economics
Other economic functions like demand, supply, income-consumption curves, etc can also be
non-linear.
5.3.2 Specification and Estimation of Non-Linear Models

Now let us consider some of the commonly used regression models that may be non-linear in
the variables but are linear in the parameters.
5.3.2.1 Transformation of the Polynomials

Some of the most common forms of non-linear economic relationships can be expressed by
polynomials.
E(Yi/D1 = 1, D2 = 0, Xi) = (  0 +  1 ) +  3 Xi, for high school education
E(Yi/D1 = 0, D2 = 1, Xi) = (  0 +  2 )+  3 Xi , for college education
1: Y =  0 +  1 X1 +  2 X12 +  3 X13 + … + U
Example 1:
If we consider the U-shaped average cost curve
C =  0 +  1 X -  2 X2 +  3 X3 +U
Where, C = total cost; X = output
To fit this model we need to transform some of the variables

Let X2 = Z and X3 = W, U = error term
Then the above model becomes
C =  0 + 1 X -  2 Z +  3 W + U
Now we can proceed with the application of OLS to the above linear relationship
Example 2.
2. Suppose we have data on yield of wheat and amount of fertilizer applied. Assume
that the increased amount of fertilizer begin to burn the crop causing the yield to decline.
Y X X2
55 1 1
70 2 4
75 3 9
65 4 16
60 5 25
we want to fit the second degree equation
Yi =  0 +  1 X1i +  2 X1i2 + Ui
Let X 1i2 = W
Then Yi =  0 +  1 X1i +  2 W + Ui
This is linear both in terms of parameters and variables. We apply OLS to the above function.
The results are presented as follows:
Yî = 36 + 24.07Xi – 3.9 X i
2
(6.471) (1.059)
t= 3.72 -3.71
It is possible to test the significance of X i2

H0:  2 = 0
H1:  2 < 0

 2  2  3.90
t=  = -3.71

S ( 2 ) 1.059
t0.05(5-3) = 2.92

2
Decision: we reject H0: since  2 is significant, X i should be retained in the model. This
Decision:
implies that the relationship between yield and amount of fertilizer has to be estimated by
second degree equation.
5.3.2.2 Double-Log or log-Log Models

This model is very common in economics. Consider the following model
Yi =  0 X 1i 1 X 2i2
 
This can be transformed in to linear form by using logarithm

lnYi = ln  0 +  1 lnX1 +  2 lnX2
Since both the dependent and the explanatory variables are expressed in terms of logarithm, the
model is known as double-log or log-log or log-linear model.
If we include the disturbance term

Yi =  0 X 1i 1 X 2i2 eU
 
Which may be alternatively expressed as

lnYi = ln  0 +  1 lnX1 +  2 lnX2 + U
This model is linear in the parameters and can be estimated by OLS if the assumptions of the
classical linear regression model are fulfilled.
Suppose lnYi = Y* lnX1 = X 1*

ln  0 =  0* lnX2 = X 2*
Y* =  0* +  1 X 1* +  2 X 2* +Ui Which is linear both in terms of the parameters and

variables.
Example 3:
3: The following table shows the yearly outputs of an industry and the amount of
inputs (labor and capital) used for eight firms.
Output Labor Capital

(Q) (L) (K)
100 1 2.0
120 1.3 2.2
140 1.8 2.3

150 2.0 1.5
165 2.5 2.8
190 3.0 3.0
200 3.0 3.3
220 4.0 3.4
The objective is to estimate the Cobb-Douglas production function for an industry on the basis
of the random sample of eight firms. The estimated production function is
logQ = 4.3900 + 0.4349X1 + 0.3395X2
or logQ = 4.3900 + 0.4349 logL + 0.3395 logK
The model can be written in its original form as follows
Q = antilog (4.3900) L0.4349 K0.3395
Q = 80.64 L0.4349 K0.3395
The model in its complete form can be given as
Q = 80.64 L0.4349 K0.3395

(0.1118) (0.2683)
R2 = 0.99
Interpretation:
Interpretation: One attractive feature of the log-log model, which has made it popular in
applied work, is that the coefficient  1 and  2 measure the elasticity of output with respect to
L and K (labor and capital).

 1 = 0.4349 – implies that a one percent increase in labor input will result a 0.4349% increase
in the output level assuming that capital is held constant.


 2 = 0.3395 – implies that a one percent increase in the amount of capital will increase the
level of output by 0.3395 percent assuming that labor is constant.

Note that the sum of elasticities (  1 +  2 ) indicate the type of returns to scale. The returns to
scale show the responsiveness of output when all inputs are charged proportionately
If  1 +  2 = 1: – Constant returns to scale
 1 +  2 > 1: – Increasing returns to scale
 1 +  2 < 1: – Decreasing returns to scale
(You may refer to the course micro economic theory I)

The following table shows the demand (Y) for a commodity and its price (X1).
Y X1
543……………61
580…………...54 a) Estimate the demand function
618…………...50 Y =  0 X 11 eU
695……………43 Or lnY = Ln  0 +  1 lnX1 + U
724 …………...38
b) Interpret  1
812 ………..….36
887……………28
991……………23
1186…………..19
1940…………..10
5.3.2.3 Semilog Models: Log-lin and Lin-log models

Semilog models are those whose dependent or explanatory variable is written in the log form.
Example 1:
1: 1. lnYi =  0 +  1 Xi + Ui
2. Yi = 0 + 1 lnXi + Ui
The above models are called semilog models. We call the first model log-lin model and the
second model is known as lin-log model. The name given to the above models is based on
whether the dependent variable or the explanatory variable is in the log form.
Now let us consider the log-lin model (model 1 above)

lnYi =  0 +  1 Xi + Ui
 1 measures the relative change in Y for a given absolute change in X, that is
relative change in Y
1 = absolute change in X
Multiplying the relative change in Y by 100 will give you the percentage change in Y for an
absolute change in X.
Example: ˆ P = 6.96 + 0.027T
Example: ln GN t
Where GNP = real gross
Introduction to Econometrics national product Page 140
T – time (in years)
(0.015) (0.012)
r2 = 0.95
F1.13 = 260.34
The above result shows that the real GNP of the country was growing at the rate of 2.70 percent
per year (for the sample period). It is possible to estimate a linear trend model
ˆ P = 1040.11 + 35 T
GN t
(18.9) (2.07)
r2 = 0.95
F1.13 = 284.7
This model implies that for the sample period the real GNP was growing at the constant
absolute amount of about $35 billion a year. The choice between the log-lin and linear model
will depend up on whether one is interested in the relative or the absolute change in the GNP.
NB: you can not compare the r2 values of the two models since the dependent variables are
different.
5.3.2.4 Reciprocal Models

The functions defined as
1
Yi =  0 + + Ui is known as a reciprocal model. Although this model is nonlinear in
xi
the variable X because it enters inversely or reciprocally, the model is linear in  0 and  1 is
therefore a linear regression model. The method of OLS can be applied to estimate the model.
1
If we let x = Z, the model becomes
i
Y =  0 +  1 Z + Ui which is linear both in terms of the parameters and variables.

1
The above model shows that as X increases indefinitely, the term  1   approaches zero and
 x
Y approaches the limiting or asymptotic value  0 . Some examples are shown below.
Y Y Y
1 > 0 1 > 0 1 < 0

0 > 0 0 < 0 0
x
0 0
x
0 (a x - (b 0 1 (c)
0
) ) 0
1  b1
Figure 5.4: the reciprocal model Yi =  0 +  1  
 x b0
We can have examples for each of the above functions (fig. a, b and c)
1. The average fixed cost curve relates the average fixed cost of production to the level of
output. As it is indicated in fig. (a) the AFC declines continuously as output increases.
2. The Philips curve which relates the unemployment rate with the rate of inflation can be a
good example for fig (b) above
3. The reciprocal model of fig (c) is appropriate Engel expenditure curve that relates a
consumer’s expenditure on a commodity to his total expenditure or income.
SUMMARY ON FUNCTIONAL FORMS

Slope Elasticity
dy dy x
Model Equation = .
dx dx y
 x
Linear Y =  0 + 1 X 1  1  
 y
 y
Log-linear lnY =  0 +  1 lnX 1   1
 x
Log-lin lnY=  0 +  1 X  1 (y)  1 (x)
1 1
Lin-log Y =  0 +  1 lnX 1    1  
 x  y
1  1   1 
Reciprocal Y =  0 + 1   - 1  2  -  1  
 x x  xy 

Note that if the value of x and y are not given elasticity is often calculated at the mean values,
x and y .

1
1. The reciprocal model Yˆt = -1.43 + 8.72 x , r2 = 0.38
t
(2.07) (2.85) , F1.15 = 9.39

is estimated from a given data
where Yt = annual percentage change in wage rate
x = unemployment rate
a) which of the figures in fig. (5.4) fits this model

b) what does the value of  0 = -1.43 indicate?
5.4 SUMMARY
Dummy variables
Variables, which assume such 0 and 1 values, are called dummy variables
Binary variables.
Qualitative variables
Categorical variables
Dichotomous variables.
If a qualitative variable has m categories, introduce only m–1 dummy variables.
Non Linear Relationships in Economics

Examples: Production function, cost function, demand, supply, income-consumption curves, etc
Specification of Non-Linear Models

-The Polynomials
Some of the most common forms of non-linear economic relationships can be expressed by
polynomials.
Example: Y =  0 +  1 X1 +  2 X12 +  3 X13 + … + U
-Double-Log or log-Log Models

A very common model in economics.
Yi =  0 X 1i 1 X 2i2
 
This can be transformed in to linear form by using logarithm

lnYi = ln  0 +  1 lnX1 +  2 lnX2 + U
 1 and  2 measure the elasticity of output with respect to L and K
(  1 +  2 ) indicate the type of returns to scale.

If  1 +  2 = 1: – Constant returns to scale
 1 +  2 > 1: – Increasing returns to scale
 1 +  2 < 1: – Decreasing returns to scale
-Semilog Models: Log-lin and Lin-log models

Semilog models are those whose dependent or explanatory variable is written in the log form.
lnYi =  0 +  1 Xi + Ui - the log-lin model
Yi = 0 + 1 lnXi + Ui - the lin-log model
The above models are called semilog models.
-Reciprocal Models
The functions defined as
1
Yi =  0 + + Ui is known as a reciprocal model.
xi
5.5 ANSWERS TO CHECK YOU PROGRESS
5.2.1
1 a) Yi =  0 +  1 Xi + U

b) E(Yi/Xi = 0) =  0
E(Yi/Xi = 1) =  0 +  1
2. a) Number of dummy variables = No of categories –1 = 3 – 1 = 2
b) Yi =  0 +  1 D1i +  2 D2i +  3 Xi + Ui
where Yi = annual salary
Xi = year of experience
D1 = 1 if Ph.D degree
= 0 otherwise
c) E(Yi/Di = 1, D2 = 0, Xi) =  0 +  1 +  3 Xi
E(Yi/D1 = 0, D2 = 0, Xi) =  0 +  3 Xi
E(Yi/D1 = 0, D2 = 1, Xi) = (  0 +  2 ) +  3 Xi
5.3.2
a) lnY = 9.121 – 0.69 ln X
(10.07) (0.02)
R2 = 0.992
b) An increase in price by one percent will decrease the demand for the commodity by
0.69 percent.
5.3.3
(a) fig (b)
(b) –1.43 is the wage floor. It shows that as X increases indefinitely the percentage decrease
in wages will not be more than 1.43 percent per year.
5.6 REFERENCES

Maddala,G.S., Econometrics, McGraw Hill, 1977.
1. Explain the role of qualitative explanatory variables in regression analysis.

2. The following regression explains the determination of moon lighter’s hourly wages.

Wm = 37.07 + 0.403 W0 – 90.06 ra + 75.51 U + 47.33 H + 113.64 re + 2.26 A
(0.062) (24.47) (21.60) (23.42) (27.62) (0.94)
R2 = 0.34 df = 311
Where Wm = moonlighting wage (cents/hour)
Wo = primary wage (cents/hour)
ra = race = 0 if while
= 1 non-white
U = urban = 0 non-urban
= 1 urban
H = High school = 0 non graduate
= 1 high school graduate
A = Age, years
From the above equation derive the hourly wage equations for the following types of
moonlighters.
a) White, non-urban, western resident, and high school graduate.

b) Nonwhite, urban, non-western resident, and non-high school graduate.
c) White, non-urban, non-west resident, and high school graduate.
d) White, non-urban, non-west, non-graduate
e) Nonwhite, urban, west, high school graduate (when the dummies are equal to 1)
f) What do you understand about the statistical significance of the variables in the above
model?
g) Interpret the coefficients.
3. The following table gives data on annual percentage change in wage rates(Y) and the
unemployment rate (X) for a country for the period 1950 – 1966.
Percentage increase Unemployment (%)
Year in wage rates (Y) X
1950 1.8 1.4

1951 8.5 1.1
1952 8.4 1.5
1953 4.5 1.5
1954 4.3 1.2

1955 6.9 1.0
1956 8.0 1.1
1957 5.0 1.3
1958 3.6 1.8
1959 2.6 1.9
1960 2.6 1.5
1961 4.2 1.4
1962 3.6 1.8
1963 3.7 2.1
1964 4.8 1.5
1965 4.3 1.3
1966 4.6 1.4
Use these data to fit the following model
 1
Yt = Yi =  0 +  1   + Ut
 xt 
4. We have seen the following growth model

ˆ P = 6.96 + 0.027T
ln GN r2 = 0.95
t
(0.015) (0.0017) F1,13 = 260.34

and that of the linear trend model
ˆ P = 1040.11 + 35 T
GN r2 = 0.95
t
(18.86) (2.07) F1,13 = 284.74

Which model do you prefer? Why?
5. The demand function for coffee is estimated as follows
Yˆt = 2.69 – 0.4795 Xt r2 = 0.6628
(0.1216) (0.1140)
where Yt = cups per person per day Xt = average retail price of coffee
Find the price elasticity of demand.

UNIT 6: INTRODUCTION TO SIMULTANEOUS EQUATION
Contents
6.1 Introduction
6.2. Simultaneous
Simultaneous Dependence of Economic Variables
6.3 Identification Problem
6.4 Test of Simultaneity
6.5 Approaches to Estimation
6.6 Summary
6.7Answer to Check Your Progress
6.9 Summary
The purpose of this unit is to introduce the student very briefly about the concept of
simultaneous dependence of economic variables. Thus, when the student have completed this
unit he/she will:
 understand the concept of simultaneous equation
 distinguish between endogenous and exogenous variables in a model
 be able to derive reduced form equation from structural equations

 understand the concept of under identified, identified and over identified equations
 be able to conduct test of simultaneity
6.1 INTRODUCTION
The application of least squares to a single equation assumes, among others, that the
explanatory variables are truly exogenous, that there is one-way causation between the
dependent variable (Y) and the explanatory variables (X). That is, the function cannot be
treated in isolation as a single equation model but belongs to a wider system of equations which
describes the relationship among all the relevant variables. In such cases we must use a multi
equation model which would include separate equations in which y and x would appear as
endogenous variables. A system describing the joint dependence of variables is called a system
of simultaneous equations.
6.2. SIMULTANEOUS DEPENDENCE OF ECONOMIC VARIABLES
In a single equations discussed in the previous units the cause and effect relationship is
unidirectional where the explanatory variables are the cause and the dependent variable is the
effect.
However, there are situations where there is a two-way flow of influence among economic
variables; that is, one economic variable affects another economic variable(s) and is, in turn,
affected by it (them). In such case we need to consider two equations and thus come up with
simultaneous equation models in which there is more than one regression equations for each
independent variable.
The first thing we need to answer is the question of “what happens if the parameters of each
equation are estimated by applying, say, the method of OLS, disregarding other equations in the
system? Recall that one of the crucial assumptions of the method of OLS is that the explanatory
X variables are either non stochastic or if stochastic (random are distributed independently of
the stochastic distribution term. If neither of these conditions is met, then, the least-squares
estimators are not only biased but also inconsistent; that is, as the sample size increases
indefinitely, the estimators do not converge to their true (population) values.
For example, consider the following hypothetical system of equation

Y1i = 10 + 12Y2i + 11X1i + U1i……………………..(6.1)
Y2i = 20 + 21Y1i + 21X1i + U2i……………………...(6.2)
Where Y1 and Y2 are mutually dependent or endogenous, variables (i.e. whose value are
determined with in the model) and X 1 an exogenous variable (whose value are determined out
side the model) and where U1 and U2 are stochastic disturbance terms, the variables Y1 and Y2
are both stochastic. Therefore, unless it can be shown that the stochastic explanatory variable
Y2 in (6.1) is distributed independently of U1 and the stochastic explanatory variable Y1 in (6.2)
in distributed independently of U2, application of classical OLS to these equations individually
will lead to inconsistent estimates.
Example.
Example. Recall that price of a commodity and the quantity (bought and sold) are determined
by the intersection of the demand and supply curves for that commodity. Consider the
following linear demand and supply models.
d
Demand function Qt = 0 + 1Pt + U1t ……………………………...(6.3)
s
Supply function Qt = 0 + 1Pt + U2t…………………………………(6.4)
Equilibrium Condition Qtd = Qts ……………….………………………..(6.5)
Where Qtd = Quantity demanded, Qts = Quantity supplied, P = price and t = time
Note that P and Q are jointly dependent variables. If U 1 changes because of changes in other
variables affecting Qtd (such as income and tastes) the demand shifts. Recall that such shift in
demand changes both P and Q. Similarly, a change in U 2t (because of changes in weather and
the like) will shift (affect) supply, again affecting both P and Q. Because of this simultaneous
dependence between Q and P, U1 and Pt in (6.3) and U2t and Pt is (6.4) cannot be independent.
Therefore a regression of Q on P as in (6.3) would violate an important assumption of the
classical linear regression model, namely, the assumption of no correlation between the
explanatory variable(s) and the disturbance term. In summary, the above discussion reveals that
in contrast to single equation models, in simultaneous equation models more than one
dependent, or endogenous, variable is involved, necessitating as many equations as the number
of endogenous variables. As a consequence such an endogenous explanatory variable becomes
stochastic and is usually correlated with the disturbance term of the equation in which it
appears as an explanatory variable.

Recall that the variable entering a simultaneous equation model are of two types: They are
called endogenous and predetermined variables. Endogenous variables are those variables
whose values are determined inside the model. Predetermined variables on the other hand, are
those whose values are determined outside the model. Predetermined variables are divided into
exogenous and lagged endogenous variables. Although non economic variables such as rainfall
and weather are clearly exogenous or predetermined, the model builder must exercise great care
in classifying economic variables as endogenous or predetermined. Consider the Keynesian
model of income determination
Consumption function: Ct = 0 + 1Yt + Ut 0 < 1 < 1 ………………… (6.6)

Income identity: Yt = Ct + It …………………………………..…..(6.7)
In this model C(consumption) and Y (income are endogenous variables. Investment (I) on the
other hand is treated as exogenous variable. Note that if there were lagged values of
consumption and income variables (i.e., Ct-1 and Yt-1) they would have been treated as lagged
endogenous and hence predetermined variables.
Consider the problem of estimating the consumption function, regressing consumption on

income. Suppose the disturbance in the consumption function jumps up. This directly increases
consumption, which through the equilibrium condition increases income. But income is the
independent variable in the consumption function, (6.6). Thus, the disturbance in the
consumption function and the regressor are positively correlated. An increase in the disturbance
term (directly implying an increase in consumption) is accompanied by an increase in income
(also implying an increase in consumption) when estimating the influence of income on
consumption, however, the OLS technique attributes both of these increases in consumption
(instead of just the latter) to the accompanying increase in income. This implies that the OLS
estimator of the marginal propensity to consume (
(1) is biased upward, even asymptotically.
Both equation 6.6 and 6.7 are structural or behavioral equations because they are portraying the
structure of an economy, where equation (6.7) being an identity. The ’s are known as the
structural parameters or coefficients. From the structural equations one can solve for the
endogenous variables and derive a reduced-form equations and the associated reduced form

coefficients. A reduced form equation is one that expresses an endogenous variable solely in
terms of the predetermined variables and the stochastic disturbances.
If equation (6.6) is substituted into equation (6.7), and solve for Y we obtain the following
0 1 Ut
Yt = + It +
1  1 1  1 1  1
= 0 + 1It + Wt ……………………………..(6.8)
0 1 Ut
where 0 = , 1 = and Wt =
1  1 1   1 1  1
Equation (6.8) is a reduced-form equation; it expresses the endogenous variable Y solely as a

function of the exogenous (or predetermined) variable I and the stochastic disturbance term U.
0 + and 1 are the associated reduced form coefficients.
Substituting the value of Y from equation (6.8) into Yt of equation (6.6), we obtain another
reduced-form equation given by
Ct = 2 + 3It + Wt ……………………………..(6.9)
0 1 Ut
where 2 = , 3 = and Wt =
1  1 1  1 1  1
The reduced form coefficients, (the ’s) are also known as impact, or short run multipliers,
because they measure the immediate impact on the endogenous variable of a unit change in the
value of the exogenous variable. If in the preceding Keynesian model the investment
expenditure (I) is increased by, say $1 and if the marginal propensity to consume (i.e., 1) is
1
assumed to be 0.8, then from 1 of (6.8) we obtain 1 = = 5. This result means that
1  0 .8
increasing the investment by $1 will immediately (i.e., in the current time period) lead to an
increase in income of $5, that is, a fire fold increase.
Notice an interesting feature of the reduced-form equations. Since only the predetermined
variables and stochastic disturbances appear on the right side of these equations, and since the
predetermined variables are assumed to be uncorrelated with the disturbance terms, the OLS
method can be applied to estimate the coefficients of the reduced-form equations (the ’s). This
will be the case if a researcher is only interested in predicting the endogenous variables, only
wishes to estimate the size of the multipliers (i.e. the ’s)

Note that since the reduced form coefficients can be estimated by the OLS method and these
coefficients are combinations of the structural coefficients, the possibility exist that the
structural coefficients can be “retrieved” from the reduced-form coefficients, and it is in the
estimation of the structural parameters that we may be ultimately interested. Unfortunately,
retrieving the structural coefficients from the reduced form coefficients is not always possible;
this problem is one way of viewing the identification problem.
6.3. THE IDENTIFICATION PROBLEM
By the identification problem we mean whether numerical estimates of the parameters of a

structural equation can be obtained from the estimated reduced-form coefficients. If this can be
done, we say that the particular equation is identified. If this cannot be done, then we say that
the equation under consideration is unidentified, or under identified.
Note that the identification problem is a mathematical (as opposed to statistical) problem
associated with simultaneous equation systems. It is concerned with the equation of the
possibility or impossibility of obtaining meaningful estimates of the structural parameters.
An identified equation may be either exactly (or fully or just) identified or over identified. It is
said to be over identified if more than one numerical value can be obtained for some of the
parameters of the structural equations. The circumstances under which each of these cases
occurs will be shown in the following discussion.
a) Under Identification
Consider the demand-and-supply model (6.3) and (6.4), together with the market clearing or
equilibrium, condition (6.5) that demand is equal to supply. By the equilibrium condition (i.e.,
Qtd = Qts ) we obtain,
0 + 1Pt + U1t = 0 + 1Pt + U2t ……………………………(6.10)

Solving (6.10) using the substitution technique employed in (6.8) and (6.9), we obtain the
equilibrium price
Pt = 0 + Vt …………………………………………..(6.11)

0 0
where 0 =
 1  1
U 2t  U 1t
V1 =
 1  1
Substituting Pt from (6.1) into (6.3) or (6.4) we obtain the following equilibrium quantity:
Qt = 1 + Wt ……………………………..(6.12)
 1  0   0 1
where 1 =
 1  1
 1U 2t   1U 1t
Wt =
 1  1
Note that 0 and 1, (the reduced-form-coefficients) contain all four structural parameters; 0,
1, 0 and 1. But, there is no way in which the four structural unknowns can be estimated from
only two reduced form coefficients. Recall from high school algebra that to estimate four
unknowns we must have four (independent) equations, and in general, to estimate k unknowns
we must have R (independent) equations. What all this means is that, given time series data on
p(price) and Q(quantity) and no other information, there is no way the researcher guarantee
whether he/she is estimating the demand function or the supply function. That is, a given P t and
Qt represent simply the point of intersection of the appropriate demand and supply curves
because of the equilibrium condition that demand is equal to supply.
b) Just or Exact Identification

The reason we could not identify the preceding demand function or the supply function was
that the same variables P and Q are present in both functions and there is no additional
information. But suppose we consider the following demand and supply model.
Demand function: Qt = 0 + 1Pt + 2It + U1t 1 < 0, 2 > 0 ......................... (6.13)
Supply function: Qt = 0 + 1Pt + 2Pt-1 + U2t …
…1 > 0, 2 > 0 ...................... (6.14)
where I = income of the consumer, an exogenous variable

P t-1 = Price lagged one period, usually incorporated in the model to explain the supply of
many agricultural commodities.
Note that Pt-1 is a predetermined variable because its value is known at time t.
By the market-clearing mechanism we have

0 + 1Pt + 2It + U1t = 0 + 1Pt + 2Pt-1 + U2t …………...............… (6.15)
Solving this equation, we obtain the following equilibrium price
Pt = 0 + 1It + 2Pt-1 + Vt ……………….............................. …………(6.16)
0  0 2
where 0 = , 1 = 
 1  1  1  1
2 U 2t  U 1t
2 = , Vt =
 1  1  1  1
Substituting the equilibrium price (6.16) into the demand or supply equation of (6.13) or (6.14)
we obtain the corresponding equilibrium quantity:
Qt = 3 + 4It + sPt-1 + Wt …......................................……..(6.17)
where the reduced-form coefficients are
 1  0   0 1  2 1
3 = , 4 =
 1  1  1  1
1 2  1U 2t   1U 1t
5 = , Wt =
 1  1  1  1
the demand-and-supply model given in equations (6.13) and (6.14) contain six structural
coefficients 0, 1, 2, 0, 1, and 2 – and there are six reduced form coefficients - 0, 1, 2,
3, 4 and 5 – to estimate them. Thus, we have six equations in six unknowns, and normally we
should be able to obtain unique estimates. Therefore, the parameters of both the demand and
supply equations can be identified and the system as a whole can be identified.
c) Over identification
Note that for certain goods and services, wealth of the consumer is another important
determinant of demand. Therefore, the demand function (6.13) can be modified as follows,
keeping the supply function as before:
Demand function: Qt = 0 + 1Pt + 2It + 3Rt + U1t ……………….(6.18)
Supply function: Qt = 0 + 1Pt + 2Pt-1 + U2t ……………………….(6.19)
where R represents wealth
Equating demand to supply, we obtain the following equilibrium price and quantity
Pt = 0 + 1It + 2Rt + 3Pt-1 + Vt ……………………………..….. (6.20)
Qt = 4 + sIt + 6Rt + 7Pt-1 + Wt ……………………………….... (6.21)

0  0 2
where 0 = , 1 =
 1  1  1  1
3 2
2 = , 3 =
 1  1  1  1
 1  0   0 1  2 1
4 = , 5 =
 1  1  1  1
 3 1 1 2
6 = , 7 =
 1  1 1  1
 1U 2t   1U 1t U 2t  U 1t
Wt = , Vt =
 1  1  1  1
The demand and supply model in (6.18)_ and (6.19) contains seven structural coefficients, but
there are eight equations to estimate them – the eight reduced form coefficients given above
(i.e., 0 … 7). Notice that the number of equations is greater than the number of unknowns. As
a result, unique estimation of all the parameters of our model is not possible. For example, one
can solve for 1 in the following two ways
6 5
1 = or 1 =
2 1
That is, there are two estimates of the price coefficient in the supply function, and there is no
guarantee that these two values or solutions will be identical. Moreover, since 1 will be
transmitted to other estimates. Note that the supply function is identified in the system (6.13)
and (6.14) but not in the system (6.18) and (6.19), although in both cases the supply function
remains the same. This is because we have “too much” or an over sufficiency of information to
identify the supply curve. The over sufficiency of the information results from the fact that in
the model (6.18) and (6.19) the exclusion of the income variable form the supply function was
enough to identify it, but in the model (6.18) and (6.19) the supply function excludes not only
the income variable but also the wealth variable. In other words, in the latter model we put “too
many” restrictions on the supply function by requiring it to exclude more variables than
necessary to identify it. However, this situation does not imply that over identification is
necessarily bad since the problem of too much information can be handled.
Notice that the situation is the opposite of the case of under identification where there is too
little information. The only way in which the structural parameters of unidentified (or under

identified) equations can be identified (and thus be capable of being estimated) is through
imposition of further restrictions, or use of more extraneous information. Such restrictions, of
course, must be imposed only if their validity can be defended.
In a simple example such as the forgoing, it is easy to check for identification; in more
complicated systems, however, it is not so easy. However this time consuming procedure can
be avoided by resorting to either the orders condition or the rank condition of identification.
Although the order condition is easy to apply, it provides only a necessary condition for
identification. On the other hand the rank condition is both a necessary and sufficient condition
for identification. [Note: the order and rank conditions for identification will not be discussed
since the objective of this unit is to briefly introduce and inform the reader about simultaneous
equation. For detailed and advanced discussion readers can refer the reference list stated at the
end of this unit].
6.4 A TEST OF SIMULTANEITY
If there is no simultaneous equation, or simultaneous problem, the OLS estimators produce

consistent and efficient estimators. On the other hand, if there is simultaneity, OLS estimators
are not even consistent so that other testing methods are looked for. If we apply these
alternative methods when there is in fact no simultaneity, the result will not be efficient. This
suggests’ that we should check for the simultaneity problem before we discord OLS in favor of
the alternatives.
A test of simultaneity is essentially a test of whether (an endogenous) regresor is correlated

with the error term. If it is, the simultaneity problem exists, in which case alternatives to OLS
must be found: if it is not, we can use OLS. To find out which is the case in a concrete
situation, we can use houseman’s specification error test.
Houseman Specification Test

Consider the following two-equation model
Demand function Qt = 0 + 1Pt + 2It + 3Rt + U1t …………………(6.22)
Supply function Qt = 0 + 1Pt + U2t ……………………………….. (6.23)
Assume that I and R are exogenous of course, P and Q are endogenous

Now consider the supply function (6.23). If there is no simultaneity problem (i.e., P and Q are
mutually independent), Pt and U2t should be uncorrelated on the other hand, if there is
simultaneity, Pt and U2t, will be correlated. To find out which is the case, the houseman test
procedure as follows:
First, from (6.22) and (6.23) we obtain the following reduced form equations
Pt = 0 + 1It + 2Rt + Vt ……..........................………………...... (6.24)
Qt = 3 + 4It + 5Rt + Wt ………...................................……….. (6.25)
where V and W are the reduced form error terms Estimating (6.24) by OLS we obtain
P̂t = ˆ 0 + ˆ 1 It + ˆ 2 Rt ………………………..(6.26)
Therefore Pt = P̂t + Vˆt …………………………….….(6.27)
Where P̂t are estimated Pt, and Vˆt are estimated residuals. Substituting (6.27) into (6.23) we
get: Qt = 0 + 1 P̂t + 1 Vˆt + U2t ……………………(6.28)
Now under the null hypothesis that there is no simultaneity, the correlation between Vˆt and U2t
should be zero, asymptotically. Thus if we ran the regression (6.28) and find that the coefficient
of Vt in (6.28) is statistically zero, we can conclude that there is no simultaneity problem.
6.5 APPROACHES TO ESTIMATION
At the outset it may be noted that the estimation problem is rather complex because there are a
variety of estimation techniques with varying statistical properties. In view of the introductory
nature of this unit we shall consider very briefly the following techniques.
a) The method of Indirect Least Squares (ILS)

For just or exactly identified structural equation, the method of obtaining the estimates of the
structural coefficients from the OLS estimators of the reduced form coefficients is known as the
method of indirect least squares (ILS). ILS involves the following three steps
Step I: - We first obtain the reduced form equations.
Step II: - Apply OLS to the reduced form equations individually.
Step III: - Obtain estimates of the original structural coefficients from the estimated reduced
form coefficients obtained in step II.
b) The method of two stage least squares (2SLS)

This method is applied in estimating an over identified equation. Theoretically, the two stages
least squares may be considered as an extension of ILS method. The 2SLS method boils down
to the application of ordinary list squares in two stages. That is, in the first stage, we apply least
squares to the reduced form equations in order to obtain an estimate of the exact and random
components of the endogenous variables appearing in the right hand side of the equation with
their estimated value and then we apply OLS to the transformed original equation to obtain
estimates of the structural parameters.
Note, however, that since 2SLS is equivalent to ILS in the just-identified case, it is usually
applied uniformly to all identified equations in the system. [For a detailed discussion of this
method readers may refer the reference list stated at the end of this unit].
Check Your Progress
1. What do we mean by simultaneous equation?
____________________________________________________________________________
____________________________________________________________________________
____________________________________________________________________________
2. When do we say a model is over identified? What is the consequence of over

identification?
____________________________________________________________________________
____________________________________________________________________________
____________________________________________________________________________
3. Briefly explain the ILS and 2SLS approaches to estimation
____________________________________________________________________________
____________________________________________________________________________
____________________________________________________________________________
4. Consider the following hypothetical model.
It = 0 + 1Yt + Ut
Y t = Q t + It
Write the reduced form equation expressed in the form of Yt and It

6.6 SUMMARY
We have seen that a unique feature of a simultaneous equation model is that the endogenous
variable in one equation may appear as an explanatory variable in other equation of the system
so that OLS method may not be applied. The identification problem in this regard asks whether
one can obtain a unique numerical estimates of the structural coefficients from the estimated
reduced form coefficients. This leads to the issue of just identified, under identified and over
identified equations. Note also that in the presence of simultaneity, OLS is generally not
applicable. None the less, it is imperative to test for simultaneity explicitly. For this purpose the
houseman specification test can be used. There are several methods of estimating a
simultaneous equation model.
 Refer the text for question 1-3

 1 1
4. Y t
 0 
1  1 1  1
Yt 
1  1
Ut
  1
I t
 0  1 It 
1  1 1  1 1  1
Ut
1. What is the economic meaning of the imposition of “Zero restriction on the parameters of a
Model?
2. Consider the following extended Keynesian model of income determination.
Ct = 0 + 1Yt + 2It + U1t
It = 0 + 1Yt-1 + U2t
Tt = 0 + 1Yt + U3t

In the model identify the endogenous and exogenous variables.
3. State whether each of the following statements are true or false.
a) The method a structural equation in a simultaneous equation model.

b) In case an equations not identified, 2SLS is not applicable.
4. The model
Ct = 10 + 12Y2t + 11X1t + U1t
Y2t = 20 + 21Y1t+ U2t
Produces the following reduced from equations.
Y1t = 4 + 8X1t
Y2t = 2 + 12X1t
a) Which structural coefficients, if any, can be estimated from the reduced form
coefficients?
b) Show that the reduced form parameters measure the total effect of a change in the
exogenous variables.

UNIT 7: REGRESSION ON DUMMY DEPENDENT VARIABLE
Contents
7.1 Introduction
7.2 Qualitative Response Models
7.2.1 Categories of Qualitative Response Model
7.3 The Linear Probability Model (LPM)
7.4 The Logit Model
7.5 The Probit Model
7.6 The Tobit Model
7.7 Summary
7.8 Answers to Check Your Progress Questions
7.9 References
The purpose of this unit is to familiarize students with the concept of qualitative dependent
variable in a regression model and the estimating problems associated with such models.
After covering this unit, you will be able to:

 understand the concept of qualitative (binary) dependant variable
 understand how to estimate such models
 differentiate the four most commonly used approaches to estimating binary models
7.1 INTRODUCTION

Binary dependent variables are extremely common in the social sciences. Suppose we want to
study the labor-force participation of adult males as a function of the unemployment rate,
average wage rate, family income, education, etc. A person either is in the labor force or not.
Hence, the dependent variable, labor-force participation, can take only two values: 1 if the
person is in the labor force and 0 if he or she is not. We can consider another example. A family
may or may not own a house. If it owns a house, it takes a value 1 and 0 if it does not. There are
several such examples where the dependent variable is dichotomous. A unique feature of all the
examples is that the dependent variable is of the type that elicits a yes or no response; that is, it
is dichotomous in nature. Now before we discuss the estimation of models involving
dichotomous response variables, let us briefly discuss the concept of qualitative response
models:
7.2 QUALITATIVE RESPONSE MODELS (QRM)
These are models inwhich the dependent variable is a discrete outcome.
Example 1.
1. Y = 0 + 1X1 + 2X2
Y = 1, if individual i attended college
= 0, otherwise
In the above example the dependent variable Y takes on only two values (i.e., 0 and 1).
Conventional regression cannot be used to analyze a qualitative dependent variable model.
The models are analyzed in a general framework of probability models.
7.2.1 Categories of Qualitative Response Models (QRM)
Two broad categories of QRM

A. Binomial Model
The choice is between two alternatives
B. Multinomial models
The choice is between more than two alternatives
Example:
Example: Y = 1, occupation is farming
= 2, occupation is carpentry
= 3, occupation is fishing
Let us define some important terminologies

i. Binary variables: are variables that have two categories and are often used to indicate that
an event has occurred or that some characteristic is present.
Example:
Example: - Decision to participate in the labor force/or not to participate
-Decision to vote or not to vote
ii. Ordinal variables:- these are variables that have categories that can be ranked.
Example: – Rank to indicate political orientation
Y = 1, radical
= 2, liberal
= 3, conservative
- Rank according to education attainment
Y = 1, primary education
= 2, secondary education
= 3, university education
iii. Nominal variables: These variables occur when there are multiple outcomes that cannot be
ordered.
Example:
Example: Occupation can be grouped as farming, fishing, carpentry etc.
Y = 1 farming
= 2 fishing Note that numbers are
assigned arbitrarily
= 3 carpentry
= 4 Livestock
iv. Count variables: These variables indicate the number of times some event has occurred.
Example:
Example: How many strikes have been occurred.
Now let us turn our attention to the four most commonly used approaches to estimating binary
response models (Type of binomial models).
1. Linear probability models
2. The logit model
3. The probit model
4. The tobit (censored regression) model.
7.3 THE LINEAR PROBABILITY MODEL (LPM)

The linear probability model is the regression model applied to a binary dependent variable. To
fix ideas, consider the following simple model:
Yi =  0 +  1 Xi + Ui ……………………………(1)
where X = family income

Y = 1 if the family owns a house
= 0 if the family does not own a house
Ui is the disturbance term
The independent variable Xi can be discrete or continuous variable. The model can be extended
to include other additional explanatory variables.
The above model expresses the dichotomous Yi as a linear function of the explanatory variable
Xi. Such kinds of models are called linear probability models (LPM) since E(Y i/Xi) the
conditional expectation of Yi given Xi, can be interpreted as the conditional probability that the
event will occur given Xi; that is, Pr(Yi = 1/Xi). Thus, in the preceding case, E(Y i/Xi) gives the
probability of a family owing a house and whose income is the given amount Xi. The
justification of the name LPM can be seen as follows.
Assuming E(Ui) = 0, as usual (to obtain unbiased estimators), we obtain

E(Yi/Xi) =  0 +  1 Xi …………………………………….(2)
Now, letting Pi = probability that Yi = 1 (that is, that the event occurs) and 1 – P i = probability
that Yi = 0 (that is, that the event does not occur), the variable Yi has the following
distributions:
Yi Probability
0 1  Pi
1 Pi
Total 1
Therefore, by the definition of mathematical expectation, we obtain
E(Yi) = 0 (1 – Pi) + 1(Pi) = Pi ……………………………………..(3)
Now, comparing (2) with (3), we can equate
E(Yi/Xi) = Yi =  0 +  1 Xi = Pi ……………………………………(4)

That is, the conditional expectation of the model (1) can, in fact, be interpreted as the
conditional probability of Yi.
Since the probability Pi must lie between 0 and 1, we have the restriction 0  E (Yi/Xi)  1 that
is, the conditional expectation, or conditional probability, must lie between 0 and 1.
Problems with the LPM

While the interpretation of the parameters is unaffected by having a binary outcome, several
assumptions of the LPM are necessarily violated.
1. Heteroscedasticity
The variance of the disturbance terms depends on the X’s and is thus not constant. Let us see
this as follows. We have the following probability distributions for U.
Yi Ui Probability
0 -  0  1 X i 1  Pi
1 1   0  1 X i Pi
Now by definition Var (Ui) = E(Ui – E(Ui)]2 = E(Ui2) since E(Ui) = 0 by assumption
Therefore, using the preceding probability distribution of Ui, we obtain
Var(Ui) = E(Ui2) = (-  0 –  1 Xi)2 (1-Pi) + (1-  0 –  1 Xi)2 (Pi)
=(-  0 –  1 Xi)2(1-  0 –  1 Xi) + (1-  0 –  1 Xi)2 (  0 +  1 Xi)
= (  0 +  1 Xi) (1-  0 –  1 Xi)
or Var(Ui) = E(Yi/Xi) [1 – E(Yi/Xi) = Pi (1 – Pi)
This shows that the variance of Ui is heteroscedastic because it depends on the conditional
expectation of Y, which, of course, depends on the value taken by X. Thus the OLS estimator of
 is inefficient and the standard errors are biased, resulting in incorrect test.
2. Non-normality of Ui
Although OLS does not require the disturbance (U’s) to be normally distributed, we assumed
them to be so distributed for the purpose of statistical inference, that is, hypothesis testing, etc.
But the assumption of normality for Ui is no longer tenable for the LPMs because like Yi, Ui
takes on only two values.
Ui = Yi-  0 –  1 Xi
Now when Yi = 1, Ui = 1 -  0 –  1 Xi
and when Yi = 0, Ui = –  0 –  1 Xi
Obviously Ui cannot be assumed to be normally distributed. Recall that normality is not
required for the OLS estimates to be unbiased.
3. Non-Sensical Predictions
The LPM produces predicted values outside the normal range of probabilities (0, 1). It predicts
value of Y that are negative and greater than 1. This is the real problem with the OLS
estimation of the LPM.
4. Functional Form:
Since the model is linear, a unit increase in X results in a constant change of  in the
probability of an event, holding all other variables constant. The increase is the same regardless
of the current value of X. In many applications, this is unrealistic. When the outcome is a
probability, it is often substantively reasonable that the effects of independent variables will
have diminishing returns as the predicted probability approaches 0 or 1.
Remark:
Remark: Because of the above mentioned problems the LPM model is not recommended for
empirical works.

1. Explain the binary or dichotomous variables.
2. Differentiate among binary, ordinal and nominal variables.
3. What is a linear probability model (LPM)? What are the shortcomings of this model?
7.4 THE LOGIT MODEL
We have seen that LPM has many problems, such as non-normality of U i, heteroscedasticity of
Ui, possibility of Yî lying outside the 0-1 range, and the generally lower R 2 values. But these
problems are surmountable. The fundamental problem with the LPM is that it is not logically a
very attractive model because it assumes that P i = E(Y = 1/X) increases linearly with X, that is,
the marginal or incremental effect of X remains constant throughout.
Example:
Example: The LPM estimated by OLS (on home ownership) is given as follows:

Yî = -0.9457 + 0.1021Xi
(0.1228) (0.0082)
t = (-7.6984) (12.515)
R2 = 0.8048
The above regression is interpreted as follows
- The intercept of –0.9457 gives the “probability” that a family with zero income will
own a house. Since this value is negative, and since probability cannot be negative, we
treat this value as zero.
- The slope value of 0.1021 means that for a unit change in income, on the average the
probability of owning a house increases by 0.1021 or about 10 percent. This is so
whether the income level is increased or not. This seems patently unrealistic. In reality
one would expect that Pi is non-linearly related to Xi.
Therefore, what we need is a (probability) model that has the following two features:
1. As Xi increases, Pi = E(Y = 1/X) increases but never steps outside the 0-1 interval.
2. The relationship between Pi and Xi is non-linear, that is, “ one which approaches zero at
slower and slower rates as Xi gets small and approaches one at slower and slower rates
as Xi gets very large”
Geometrically, the model we want would look something like fig 7.1 below.
1 CDF
X
- 0 
Fig 7.1 A Cumulative Distribution Function (CDF)
The above S-shaped curve is very much similar with the cumulative distribution function
(CDF) of a random variable. (Note that the CDF of a random variable X is simply the

probability that it takes a value less than or equal to x0, were x0 is some specified numerical
value of X. In short, F(X), the CDF of X, is F(X = x0) = P(X  x0). Please refer to your text
statistics for economists).
Therefore, one can easily use the CDF to model regressions where the response variable is
dichotomous, taking 0-1 values.
The CDFs commonly chosen to represent the 0-1 response models are.
a) the logistic – which gives rise to the logit model
b) the normal – which gives rise to the probit (or normit) model
Now let us see how one can estimate and interpret the logit model.
Recall that the LPM was (for home ownership)

Pi = E(Y = 1/Xi) =  0 +  1 Xi
Where X is income and Y = 1 means the family owns a house. Now consider the following
representation of home ownership.
1
Pi = E(Y = 1/Xi) =
1  e  (  0  1 X i )
1
Pi = where Zi =  0 +  1 Xi
1  e  Zi
This equation represents what is known as the (cumulative) logistic distribution function. Since
the above equation is non linear in both the X and the  ’s. This means we cannot use the
familiar OLS procedure to estimate the parameters. This can be linear as follows.
1
1 – Pi =
1  e Zi
Pi 1  e Zi
  Zi
e Z i
1  Pi 1  e
Pi
Now is simply the odds ratio in favor of owning a house- the ratio of the probability
1  Pi
that a family will own a house to the probability that it will not own a house.
Taking the natural log of the odds ratio we obtain

 Pi 
Li = ln   = Zi =  0 +  1 Xi
 1  Pi 
L(the log of the odds ratio) is linear in X as well as  (the parameters). L is called the logit
and hence the name logit model is given to it.
The interpretation of the logit model is as follows:

 1 – the slope measures the change in L for a unit change in X.
 0 – the intercept tells the value of the log-odds in favor of owning a house if income is
zero. Like most interpretations of intercepts, this interpretation may not have any physical
meaning.
Now for estimation purposes, let us write the logit model as

 Pi 
Li = ln   =  0 +  1 Xi + Ui
 1  Pi 
To estimate the above model we need values of Xi and Li. Standard OLS cannot be applied
 1  0
since values of L are meaningless (ex. L = ln   and L = ln   .
 0  1
Therefore estimation is by using the maximum likelihood method. (because of its mathematical
complexities we will not discuss the method here).
Example:
Example: Logit estimates. Assume that Y is linearly related to the variables Xi’s as follows:
Yi =  0 +  1 X1 +  2 X2 +  3 X3 +  4 X4 +  5 X5 + Ui
The logit estimate results are presented as:
Yi = -10.84 – 0.74X1 – 11.6X2 – 5.7X3 – 1.3X4 + 2.5X5
t = (-3.20) (-2.51) (-3.01) (-2.4) (-1.37) (1.62)
The variables X1, X2 and X3 are statistically significant at 99%. The variable X 4 is significant at
90%. The above estimated result shows that the variables X1, X2 and X3 have a negative effect
on the probability of an event to occur (i.e., y = 1). While the sign of  5 or the variable X5 has
a positive effect on the probability of an event to occur.
Note:
Note: Parameters of the model are not the same as the marginal effects we are used to when
analyzing OLS.

7.5 THE PROBIT MODEL
The estimating model that emerges from the normal CDF is popularly known as the probit
model.
Here the observed dependent variable Y, takes on one of the values 0 and 1 using the following
criteria.
Define a latent variable Y* such that Yi * = X i1  + I
Y = 1 if Yi * > 0
0 if Yi *  0
The latent variable Y* is continuous (-
(- < Y* < ). It generates the observed binary variable Y.
An observed variable, Y can be observed in two states:
i) if an event occurs it takes a value of 1
ii) if an event does not occur it takes a value of 0
The latent variable is assumed to be a linear function of the observed X’s through the structural
model.
Example:
Example:
Let Y measures whether one is employed or not. It is a binary variable taking values 0 and 1.
Y* - measures the willingness to participate in the labor market. This changes continuously and
is unobserved. If X is a wage rate, then as X increases the willingness to participate in the labor
market will increase. (Y* - the willingness to participate cannot be observed). The decision of
the individual will be changed (becomes zero) if the wage rate is below the critical point.
Since Y* is continuous the model avoids the problems inherent in the LPM model (i.e., the
problem of non-normality of the error term and heteroscedasticity)
However, since the latent dependent variable is unobserved the model cannot be estimated
using OLS. Maximum likelihood can be used instead.
Most often, the choice is between normal errors and logistic errors, resulting in the probit
(normit) and logit models, respectively. The coefficients derived from the maximum likelihood
(ML) function will be the coefficients for the probit model, if we assume a normal distribution.
If we assume that the appropriate distribution of the error term is a logistic distribution, the

coefficients that we get from the ML function will be the coefficient of the logit model. In both
cases, as with the LPM, it is assumed that E[
E[i/Xi] = 0
In the probit model, it is assumed that Var (

(i/Xi) = 1. In the logit model, it is assumed that Var
(i/Xi) =  2 3 . Hence the estimates of the parameters (  ’s) from the two models are not
directly comparable.
But as Amemiya suggests, a logit estimate of a parameter multiplied by 0.625 gives a fairly
good approximation of the probit estimate of the same parameter. Similarly the coefficients of
LPM and logit models are related as follows:
 LPM = 0.25  Logit , except for intercept
 LPM = 0.25  Logit + 0.5 for intercept
Summary
- logit function
e (  X i ) 1
P(Y = 1/X) =    X i  
(we obtain this by dividing both the numerator and
1 e 1  e    X i
  xi
denominator by e
- Probit function
P(Y = 1/X) =  (-
(- -  Xi)
where (.) is the normal probability distribution function
2
1 1 X  
(i.e., exp  
 2 2 2  
Therefore, it is possible to avoid the problems of nonsensical result and the constancy impact of
X on the dependent variable (i.e. it will not be constant) since both models are non linear.

1. Explain the differences between the LPM and the logit or probit models.
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
2. Specify the mathematical form of both the probit and logit models.

________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
3. Explain or outline the similarities and differences between the probit and logit models.
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
7.6 THE TOBIT MODEL
An extension of the probit model is the tobit model developed by James Tobin. To explain this
model, let us consider the home ownership example.
Suppose we want to find out the amount of money the consumer spends in buying a house in
relation to his or her income and other economic variables. Now we have a problem. If a
consumer does not purchase a house, obviously we have no data on housing expenditure for
such consumers; we have such data only on consumers who actually purchase a house.
Thus consumers are divided into two groups, one consisting of say, N 1 consumers about whom
we have information on the regressors (say income, interest rate etc)as well as the regresand
( amount of expenditure on housing) and another consisting of say, N 2 consumers about whom
we have information only on the regressors but on the regressand. A sample in which
information on regressand is available only for some observations is known as a censored
sample. Therefore, the tobit model is also known as a censored regression model.
Mathematically, we can express the tobit model as

Yi =  0 +  1 X1i + Ui if RHS > 0
= 0, otherwise
Where RHS = right-hand side
The method of maximum likelihood can be used to estimate the parameters of such models.
7.7 SUMMARY
- Qualitative response models (QRM)

- Categories of qualitative response models

- Binomial models
- Multinomial models
- Binary variables, ordinal variables, nominal variables and count variables
- The linear probability model (LPM)
- Functional form of LPM
- Shortcomings of the LPM
- Non normality
- Heleroscedasticity
- Non-sensical results
- Functional form
- The logit model
1
P (Y = 1/X) =     X i 
1 e
- The probit model
2
1 1 X  
P(Y = 1/X) =  (-
(- -  Xi) where (.) = exp  
 2 2 2  
- Latent variable
- Similarity and differences between logit and probit models
- Both the logit and probit models guarantee that the estimated probabilities lie in the 0-1
range and that they are non linearly related to the explanatory variables.
- Interpretation of the logit and probit models are the same
- Estimation is by MLE
- The Tobit model is an extension of the probit and is mainly applied when we have
censored data.
- The tobit model
Yi * = Xi + i
Y = Yi * if Yi * > 0
0 if Yi *  0
7.8 ANSWERS TO CHECK YOUR PROGRESS QUESTIONS
Answers to check your progress questions in this unit are already discussed in the text.

7.9 REFERENCES

Maddala,G.S., Econometrics, McGraw Hill, 1977.
1. When do we use models like LPM, logit and probit?

2. The LPM is the simplest of the above three models. But it has several limitations.
Discuss.
3. Can we use the standard OLS method to estimate the probit and logit models? Why?
4. Why do we call the tobit model is a censored regression model.
5. Specify the mathematical form of the tobit model and discuss how one can estimate
such models?

UNIT 8: TIME SERIES ECONOMETRICS (A BRIEF INTRODUCTIN)
CONTENT
8.1 Introduction
8.2 Stationarity and Unit Roots
8.3 Cointegration Analysis and Error Correction Mechanism
8.4 Summary
The aim of this unit is to extend the discussion of regression analysis by incorporating a brief
discussion of time series econometrics.
After the student have completed this unit, he/she will

 understand concept of stationarity
 formulate and conduct ADF test
 distinguish between trend stationary and difference stationalry process
 understand the relationship between spurious regression and integration
 specify an error correction model.
8.1 INTRODUCTION
Recall from our unit one discussion that one of the two important type of data used in empirical
analysis is time series data. Time series data have become so frequently and intensively used in
empirical research that econometricians have recently begun to pay very careful attention to
such data.
In this very brief discussion we first define the concept of stationary time series and then
develop tests to find out whether a time series is stationary. In this connection we introduce
some related concepts, such as unit roots. We then distinguish between trend stationary and

difference stationary time series. A common problem in regression involving time series data is
the phenomenon of spurious regression. Therefore, an introduction to this concept will be
made. A last the concept of cointegration will be stated and point out its importance in
empirical research.
8.2 STATIONARITY AND UNIT ROOTS
Any time series data can be thought of as being generated by a stochastic or random process. A
type of stochastic process that has received a great deal of attention by time series analysis is
the so-called stationary stochastic process.
Broadly speaking, a stochastic process is said to be stationary if its mean and variance are
constant over time and the value of covariance between two time periods depend only on the
distance or lag between the two time periods and not on the actual time at which the covariance
is computed. A non-stationary series on the other hand, do not have long run mean where the
variable returns and the variance extends to infinity as time goes by.
For many of time series data, however, stationarity is unlikely to exist. If this is the case, the
conventional hypothesis testing procedure based on t, F, Chi-square and other tests may be
suspected. In other words, if variables in the model are non-stationary, it results in spurious
regression. That is, the fact that the variables share common trend will tend to produce
significant relationship between the variables. Nonetheless, the relationship exhibits
contemporaneous correlation as a result of common trend rather than true causal relationship.
Hence, with non-stationary variables, conducting OLS generate misleading result.
Studies have developed different mechanism that enable non-stationary variables attain
stationalrity. It has been argued that if a variable has deterministic trend (i.e. if it can be
perfectly predictable rather than being variable or stochastic), including trend variable in the
regression removes the trend component and makes it stationary. For example in the regression
of consumption expenditure (PCE) an income (PDI) if we observe a very high r 2, which is
typically the case, it may reflect, not the true degree of association between the two variables,
but simply the common trend present in them. That is, with time the two variables move
together. To avoid such spurious association, the common practice is to regress PCE on PDI
and t(time), the trend variable. The coefficient of PDI obtained from this regression now

represents the net influence of PDI on PCE, having removed the trend effect. In other words,
the explicit introduction of the trend variable in the regression has the effect of detrending (i.e.,
removing the influence of trend from) both PCE and PDI. Such process is called trend
stationary since the deviation from the trend is stationary.
However, most time series data have a characteristic of stochastic trend (that is, the trend is
variable which therefore, cannot be predicted with certainty). In such cases, in order to avoid
the problem associated with spurious regression, pre-testing the variables for the existence of
unit roots (i.e., non stationarity) becomes compulsory. In general if a variable has stochastic
trend, it needs to be differenced in order to obtain stationarity. Such process is called difference
stationary process.
In this regard, the Dickey Fuller (DF) test enables us to assess the existence of stationarity. The
simplest DF test starts with the following first order autoregressive model.
Yt = Yt-1 + Ut ………………………………………..(8.1)
Subtracting Yt-1 from both sides gives
Yt -Yt-1 = Yt = Yt-1 - Yt-1 + Ut Yt = Yt-1 +
= (
(-1)Yt-1 + Ut
Yt = Yt-1 + Ut …………………………………………(8.2)
where Yt = Yt -Yt-1,  =  - 1
The test for stationarity is conducted on the parameter . If  = 0 or (
( = 1) it implies that Yt =
Ut and hence the variable Y is not stationary (has unit root). In times series econometrics, a time
series that has a unit root is known as a random walk. This is because the change in Y ( Yt) is
purely a result of the error term, Ut. Thus, a random walk is an example of non-stationary time
series.
For the test of stationerity the hypothesis is formulated as follows:
H0:  = 0 or (
( = 1)
H1:  < 0 or (
( < 1)
Note that (8.2) is appropriate only when the series Y t has a zero mean and no trend term. But it
is impossible to know whether the true value of Yt has zero mean and no trend term. For this
reason including a constant (drift) and time trend in the regression is recommended. Thus (8.2)
is expanded to the following form.

Yt =  + Yt-1 + T + Ut…………………………….(8.3)
where  = constant term, T = the trend element.
Here as well the parameter  is used while testing for stationerity. Rejecting the null hypothesis
(of H0:  = 0) implies that there exists stationerity. That is Yt is also influenced by Yt-1 in
addition to Ut. Thus, the change in Yt (i.e., Yt) does not follow a random walk. Note that
accepting the null hypothesis is suggests the existence of unit root (or non stationarity)
The DF test has a series limitation in that it suffers from residual autocorrelation. Thus, it is
inappropriate to use DF distribution with the presence of a utocorrelated errors. To amend this
weakness, the DF model is augmented with additional lagged first difference of the dependent
variable. This is called Augmented Dicky Fuller (ADF). This regression model avoids
autocorrelation among the residuals. Incorporating lagged first difference of Y t in (8.3) gives
the following ADF model.
k
Yt =  + T + Yt-1 + i  Yt  i  U t ………........................….(8.4)
i 1
where k is the lag length

Now, the test for stationarity is free from the problem of residuals autocorrelation. Thus the
hypothesis testing (just like the above) can be conducted.
Example:
Example: Let us illustrate the ADF test using the Personal Consumption Expenditure (PCE)
data of Ethiopia suppose that regressions of PCE that corresponds to (8.4) gave following
results:
PCE = 233.08 + 1.64t – 0.06PCEt-1 + PCEt-1 …………………………(8.5)
For our purpose the important thing is  (taw) statistic of PCEt-1 variable. This is a table that
helps to test the hypothesis stated earlier. Suppose the calculated  value do not exceeds its
table value, in this case we fail to reject the null hypothesis which indicates the PCE time series
is not stationary. Thus, if it is not stationary, using the variable at levels will lead to spurious
regression result. As has been stated earlier, if a variable is not stationary at levels, we need to
conduct the test on the variable in its difference form. If a variable that is not stationary in
levels appears to be stationary after nth difference then the variable is said to be integrated
order of n, symbolically we write I(n). Suppose we repeat the preceding exercise using the first
difference of PCE (i.e., PCEt = PCEt – PCEt-1as explanatory variables). If the test result allows

us to reject the null hypothesis we conclude that PCE is integrated of order one, I(1). Note from
our discussion that application of OLS in stationary variables will bring about non-spurious
result. Therefore, before regression is performed that make use of time series variables, the
stationarity of all variables must first be checked.
Note that taking the variables in difference form presents only the dynamic interaction among
the variables with no information about the long run relationship. However, if the variables that
are non stationary separately have the same trend, it points that the variables have a stationary
linear combination. This in turn implies that the variables are cointegrated, i.e., there exists long
run equilibrium (relationship) among the variables.
1. Distinguish between trend stationary process (TSP) and a difference stationary process
(DSP)?
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
2. What is meant by stationarity and unit roots?
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
3. What is meant by integrated time series?
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
4. Discuss the concept of spurious regression
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
5. Explain the concept of ADF tests

_________________________________________________________________________
_________________________________________________________________________
_________________________________________________________________________
_________________________________________________________________________
8.3 COINTEGRATION ANALYSIS AND ERROR CORRECTION MECHANISM
Cointegration among the variables reflects the presence of long run relationship in the system.
We need to test for cointegration because differencing the variables to attain stationarity
generates a model that does not show the long run behavior of the variables. Hence, testing for
cointegration is the same as testing for long run relationship.
There are two approaches used in testing for cointegration. They are i) Engle-Granger (two-step
algorism) and ii) Johansen Approach.
The Engle-Granger (EG) method requires that for cointegration to exist, all the variables mush
be integrated of the same order. Once the variables are found to have the same order of
integration, the next step is testing for cointegration. This needs to generate the residual from
the estimated static equation and test its stationarity. By doing so we are testing whether the
deviation from the long run (captured by the error term) from the long run are stationary or not.
If the residuals are found to be stationary, it implies that the variables are cointegrated. This in
turn ensures that the deviation from the long run equilibrium relationship dies out with time.
Example:
Example: Suppose we regress PCE on PDI to find out the following estimated relationship
between the two.
PCEt = 0 + 1PDI + Ut ………………………………………(8.6)
To identify whether PCE and PDI are cointegrated (i.e., have stationary linear combination) or
not we write (8.6) as follows
Ut = PCEt-1 - 0 - 1PDI ....……………………………………(8.7)
The purpose of (8.7) is to find that U t [i.e., the linear combination (PCE - 0 - 1PDI)] is I(0) or
stationary. Using the procedure stated in the earlier sub tunit for testing stationarity, if we reject
the null hypothesis then we say that the variables PCE and PDI are cointegrated.

If variables are cointegrated, the regression on the levels of the two variables as in (8.6), is
meaningful (i.e., not spurious); and we do not lose any valuable long term information, which
would result if we were to use their first differences instead.
In short, provided we check that the residuals are stationary, the traditional regression
methodology that we have learned so far (including t and F tests) is applicable to data involving
time series.
We just showed that PCE and PDI are cointegrated, that is there is a long-term equilibrium
relationship between the two. Of course, in the short run there may be disequilibrium.
Therefore, one can treat the error term in (8.7) as the “equilibrium error”. We can use this error
term to tie the short-run behavior of PCE to its long run value. In other words, the presence of
cointegration makes it possible to model the variables (that are in first difference) through the
error correction model (ECM). In the model a one time lagged value of the residual hold the
error correction term where its coefficient captures the speed of adjustment to the long run
equilibrium. The following model specification show with the PCE/PDI example how the ECM
works
Ê
PC Uˆ
t = 0 + 1PDIt + 2 t 1 + 
t ……................................(8.8)
ˆ
where U t  1 is the one period lagged value of the residual from regression (8.6) and t is the
error term with the usual properties.
In (8.8) PDI captures the short run disturbances in PDI whereas the error correction term
Uˆ
t 1 captures the adjustment toward the long-run equilibrium. If 2 is statistical significant
(and has to be negative between 0 and –1), it tells us what proportion of the disequilibrium in
PCE in one period is corrected in the next period.
Example:
Example: Suppose we obtain the following result
Ê ˆ
PC 0.29PDIt – 0.08 U t  1 …………………………..(8.9)
t = 11.69 + 0.29
(5.32) (4.17) (-2.3)

where the figures in parenthesis at t-values
The result shows that short-run changes in PDI have significant positive effects on PCE and
that about 0.09 (or 9%) of the discrepancy (or deviation) between the actual and the long run,

or equilibrium, value of PCE is eliminated or corrected at each year (Note that the error
correction term captures the speed of adjustment to the long run equilibrium.
However, the use of Engle Granger method is criticized for its failure on some issues that are
addressed by the Johansen Approach. Interested readdress can get a detailed discussion of this
advanced approach on Harris (1995).
8.4 SUMMARY
In this very brief unit we discussed time series regression analysis. The explanation showed that
most economic time series are non-stationary. Stationarity can be checked by using the ADF
test. Regression of one time series variable on one or more time series variables often can give
spurious results. This phenomenon is known as spurious regression. One way to guard against
it is to find out if the time series are cointegrated. Cointegration of two (or more) time series
suggests that there is a long run or equilibrium relationship between them. The Engle-Granger
or Johansen approach can be used to find out if two or more time series are cointegrated. Note
also that the ECM is a means of reconciling the short run behavior of economic variable with
its long run behavior.
1. Discuss the concept of cointegration

______________________________________________________________________
______________________________________________________________________
2. Explain the error correction mechanism (ECM) what is its relation with cointegration
______________________________________________________________________
______________________________________________________________________
The answers for all questions are found in the discussion under sub units 8.2 and 8.3

1. Outline the Engle Granger Method for cointegration
2. A time series that has a unit root is called a random walk. Explain
Discuss the following

3. Why do we need to incorporate a one period lagged value of the error term in the ECM.
4. Suppose the value of Uˆ t  1 is ECM regression is -0.47. Interpret the result.

8.7 REFERENCES
 Gujarati D. (1995) “Basic Econometric”, McGraw-hill Inc 3rd ed.

 Harris R. (1995), “using cointegration analysis in econometric modeling” Prentice Hall
 Pub., 4th ed.
Kennedy P. (1998), “A Guide to Econometrics”, Black well Pub.,
 Koutsoyiannis A. (1997), "Theory of econometrics", 2nd 3d.
 Kmenta, J., Elements of Econometrics, Macmillan, New York, 1971
 Maddala,G.S., Econometrics, McGraw Hill, 1977.

Econometrics Module 2

Uploaded by

Copyright:

Available Formats

Econometrics Module 2

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Econometrics Module 2

Uploaded by

Copyright:

Available Formats

What is the definition of econometrics?

What is the definition of econometrics?

What are the three main goals of econometrics?

What are the three main goals of econometrics?

UNIT 1: INTRODUCTION

1.0 AIMS AND OBJECTIVES

After completing this unit, you will be able to:

Introduction to Econometrics Page 1

B. The main concern of Mathematical economics is to express economic theory in

Although econometrics presupposes the expression of economic relationships in mathematical

C. Economic Statistics is mainly concerned with collecting, processing, and presenting

Introduction to Econometrics Page 2

1.2 GOALS OF ECONOMETRICS

Three main goals of econometrics

Introduction to Econometrics Page 3

Classical Bayesian Classical Bayesian

Econometrics may be divided in to two broad categories

- Theoretical Econometrics is concerned with the development of appropriate methods for

Check Your Progress 1.1

1.4 METHODOLOGY OF ECONOMETRICS

Introduction to Econometrics Page 4

Specification of the model

Introduction to Econometrics Page 5

Estimation of the Model

Introduction to Econometrics Page 6

Evaluation of the forecasting power of the estimated model

Introduction to Econometrics Page 7

Reasons for a model’s poor forecasting performance

Introduction to Econometrics Page 8

Check Your Progress 1.2.

1.5 THE NATURE AND SOURCES OF DATA FOR ECONOMETRIC ANALYSIS

1.5.1 Types of Data

1.5.2 The Sources of Data

A governmental agency, an international agency, a private organization or an individual may

Introduction to Econometrics Page 10

Introduction to Econometrics Page 11

Econometrics is an amalgam of economic theory, mathematical economics, economic

In any econometric research we may distinguish four stages:

A) Specification of the model; which involves the determination of:

Introduction to Econometrics Page 12

B) Estimation of the Model

The stage of estimation

D) Evaluation of the forecasting power of the estimated model

1.7. ANSWERS TO CHECK YOUR PROGRESS

Check Your Progress 1.1

Check your progress 1.2

Introduction to Econometrics Page 13

Answers are explicitly discussed in the text

Gujarati, D., Basic Econometrics.

1.9 MODEL EXAMINATION QUESTIONS

1. Economic theory postulates exact relationships between economic variables. Consider

Where Qd = Quantity demanded

Introduction to Econometrics Page 14

2.0 AIMS AND OBJECTIVES

After completing this unit the student will, among others,

 conduct a measure of goodness of fit of regression estimates.

 construct hypothesis testing procedure for regression coefficients

 apply the regression result to forecasting (prediction)