Chapter 3

Addis Ababa University,
School of Commerce
Department of Economics
Introduction to Econometrics
December 8, 2023
Chapter 3
Multiple Linear Regression
Introduction to Econometrics Addis Ababa University, School of Commerce December 8, 2023 2 / 43

Introduction
I Simple linear regression (SLR) is mostly unrealistic since the

relationship between economic variables involves several variables.
I Thus, to be more realistic, we include more and more explanatory
variables into the regression model ⇒ the issue of this chapter.
I Multiple linear regression (MLR) is specified as follows:
Yi = β0 + β1 X1i + β2 X2i + ... + βK XK i + i
where β0 is intercept & β1 ....βK are slope parameters
I The sample counterpart is expressed as:
Yi = βˆ0 + βˆ1 X1i + βˆ2 X2i + ... + βˆK XK i + ei
where βˆ0 is the intercept & βˆ1 ...βˆK are the slope estimators

Introduction
I What changes as we move from simple to multiple regression?

1 Potentially more explanatory power with more variables;
2 The ability to control for other variables; (and the interaction of
various explanatory variables: correlations and multicollinearity);
3 Harder to visualize drawing a line through three or more
(n)-dimensional space.
4 The R2 is no longer simply the square of the correlation coefficient
between Y and X.

Introduction
Interpreting slope coefficients:

I Let us consider the following multiple linear regression given by the
population regression equation (PRE) where k = 3
Yi = β0 + β1 X1i + β2 X2i + β3 X3i + ui
I The population regression function corresponding to the above
equation is:
E(Yi |X1i , X2i , X3i ) = β0 + β1 X1i + β2 X2i + β3 X3i
I the slope coefficient βj is the marginal effect of the corresponding
explanatory variable Xj on the conditional mean of Y .
I Formally, the slope coefficients, βj : j = 1, 2, 3 are the partial
derivatives of the population regression function (PRF) with respect
to the explanatory variables Xj : j = 1, 2, 3 :
∂E(Yi |X i ) ∂E(Yi |X1i ,X2i ,X3i )
∂Xj i = ∂Xj i = βj ; j = 1, 2, 3

Introduction
I For example, for j = 1;

∂E(Yi |X1i ,X2i ,X3i ) ∂E(β0 +β1 X1i +β2 X2i +β3 X3i )
∂X1i = ∂X1i = β1
Interpretation: A partial derivative isolates the marginal effect on the con-
ditional mean of Y of small variations in one of the explanatory variables,
while holding constant the values of the other explanatory variables in the
PRF.
ü β1 = ( ∆E(Y |X1 ,X2 ,X3 )
∆X1 )∆X 2=0,∆X 3=0 = ∂E(Y |X1 ,X2 ,X3 )
∂X1
I Thus, β1 =the partial marginal effect of X1 on the conditional mean

of Y holding constant the values of the other regressors X2 and X3 .
I Including X2 and X3 in the regression function allows us to estimate
the partial marginal effect of X1 on E(Y |X1 , X2 , X3 ) while
1 holding constant the values of X2 and X3
2 controlling for the effects on Y of X2 and X3
3 conditioning on X2 and X3 .

Introduction
Assumptions of Classical Linear Regression Model (CLRM)

I We devide the CLRM into three:
1 Assumptions respecting the formulation of PRF or SRF
2 Assumptions respecting the statistical properties of the random error
term and the dependent variable.
3 Assumptions respecting the properties of the sample data
Assumption 1 (A1 ):The population regression equation (PRE) takes the

form
Pk
Yi = β0 + β1 X1i + β2 X2i + ... + βk Xki + i =β0 + j=1 βj Xj + ui
A1 further incorporates three distinct assumptions.

∂Yi
1 Additive Random Error Term, ui ⇒ ∂u i
= 1 ∀i
2 Linearity-in-Parameters or Linearity-in-Coefficients
3 Parameter or Coefficient Constancy ⇒ βj i = βj = ∀ i

Introduction
Assumption 2 (A2 ): The Assumption of Zero Conditional Mean Error
E(ui |X1i , X2i , ..., Xki ) = 0
I Implications of A2 :
1 E(ui |X1i , X2i , ..., Xki ) = 0 ⇒ E(ui ) = 0 ∀i. This is from the law of
iterated expectations which suggest that E[E(ui |Xs )] = E(ui ). Since
E(ui |Xs ) = 0, A2 implies that E(ui ) = E[E(ui |Xs )] = E[0] = 0
If the conditional mean of u for each and every population value of Xs
equals zero, then the mean of these zero conditional means must also
be zero.
2 Orthogonality condition:
E(ui |X1i , X2i , ..., Xki ) = 0 ⇒ cov(Xj i , ui ) = E(Xj i ui ) = 0
∀i, j = 1, ..., k

Introduction
The equality of cov(Xj i , ui ) = E(Xj i ui ) = 0 can be shown as follows:

cov(Xj i , ui ) = E[(Xj i − E(Xj i ))(ui − E(ui ))] by definition
= E[(Xj i − E(Xj i ))(ui )] since E(ui ) = 0
= E[(Xj i ui − E(Xj i )E(ui )] since E(Xj i ) is constant
= E(Xj i ui ) = 0 since E(ui ) = 0 by A2
3 There exists no linear and nonlinear association between ui and any
of the k regressors Xj (j = 1, . . . , k)
cov(Xj i ,ui ) cov(Xj i ,ui )
⇒ ρ(Xj i , ui ) = √ = std(Xj i )std(ui ) =0
var(Xj i )var(ui )
4 the conditional mean of the population Yi values corresponding to
given values Xj i of the regressors Xj (j = 1, . . . , k) equals the
population regression function (PRF):
E(ui |Xj i ) = 0 ⇒ E(Yi |Xj i ) = β0 + β1 X1i + β2 X2i + ... + βk Xki
Pk
⇒ E(Yi |Xj i ) = β0 + j=1 βj Xj i

Introduction
Violation of zero covariance

I the random error term u represents all the unobservable, unmeasured
and unknown variables other than the regressors Xj , j = 1, . . . , k that
determine the population values of the dependent variable Y .
I Anything that causes the random error u to be correlated with one or
more of the regressors Xj(j = 1, . . . , k) will violate assumption A2 :
I If Xj and u are correlated, then E(ui |Xs ) must depend on Xj and so
cannot be zero.
I Common causes of correlation or dependence between the Xj and ui
1 Incorrect specification of the functional form of the relationship
between Y and the Xj , j = 1, . . . , k.
2 Omission of relevant variables that are correlated with one or more of
the included regressors Xj , j = 1, . . . , k.
3 Measurement errors in the regressors Xj , j = 1, . . . , k.
4 Joint determination of one or more Xj and Y .

Introduction
Assumption 3 (A3 ) : Constant Error Variances/Homoskedastic Errors
I The conditional variances of the random error terms ui are identical

for all observations – i.e., for all sets of regressor values,
Xj i , j = 1, ..., k
var(ui |Xj i ) = E(u2i |Xj i ) = σ 2 > 0 ∀i
ý var(ui |Xj i ) = E([ui − E(ui |Xj i )]2 |Xj i )= E(u2i |Xj i ) = σ 2 > 0
Implications of A3 :
1 the unconditional variance of the random error u is also equal to σ 2 ⇒
V ar(ui ) = E(ui − E(ui ))2 = E(u2i ) = σ 2 . This is based on the
iterated expectation.
2 the conditional variance of the regressand Yi corresponding to given set
of regressor values Xj i , j =1,...,k equals the conditional error variance
σ2

Introduction
I This can be shown as follows:

var(Yi |Xj i ) = E([Yi − E(Yi |Xj ) ]2 |Xj i ) by definition
=E([Yi − β0 − kj=1 βj Xki ]2 |Xj i ) since
P
E(Yi |Xj i ) = β0 − kj=1 βj Xki

P
=E(u2i |Xj i ) since ui = Yi − β0 − kj=1 βj Xki

P
= σ 2 since E(u2i |Xj i ) = σ 2 by A3

I Thus, var (ui |Xj i ) =var(Yi |Xj i ) = σ 2
I A3 says that the variance of the random errors for any particular set
of regressor values Xj i is the same as the variance of the random
errors for any other set of regressor values, Xj s for all Xs 6= Xi
⇒ var (ui |Xj i ) =var (us |Xj s ) = σ 2 > 0 for all Xj i 6= Xj s
I The conditional distributions of the population Y values around the
PRF have the same constant variance σ 2 for all sets of regressor
values ⇒ var (Yi |Xj i ) =var (Ys |Xj s ) = σ 2 > 0 for all Xj i 6= Xj s .

Introduction
Assumption 4 (A4 ) : Zero Error Covariances

I Two distinct random error terms ui and us (i 6= s)corresponding to
two different sets of regressor values Xi 6= Xs are not correlated.
Cov(ui , us |Xi , Xs ) = E([(ui − E(ui |Xi ))(us − E(us |X) )]|Xi , Xs ) =
E(ui us |Xi , Xs ) = 0
ü all pairs of error terms corresponding to different sets of regressor
values have zero covariance.
I the conditional covariance of any two distinct values of the
regressand, say Yi and Ys where i 6= s, is equal to zero:
cov(ui , us |Xi , Xs ) = 0 ∀ i 6= s
⇒ cov(Yi , Ys |Xi , Xs ) = E(ui us |Xi , Xs ) = 0 ∀ i 6= s

Introduction
I This can easily shown as follows:

cov(Yi , Ys |Xi , Xs ) = E([(Yi − E(Yi |Xi ))(Ys − E(Ys |Xs ))]|Xi , Xs )
= E(ui us |Xi , Xs ) = 0
Pk
since Yi − E(Yi |Xi ) = Yi − β0 − j=1 βj Xj i = ui by assumption A1
and
Pk
Ys − E(Ys |Xs ) = Ys − β0 − j=1 βj Xj s = us by assumption A1
I Therefore, cov(Yi , Ys |Xi , Xs ) = E(ui us |Xi , Xs ) = 0 by assumption
A4

Introduction
Assumption 5 (A5 ): Random Sampling or Independent Random Sam-

pling
I The sample data consist of N randomly selected observations on the
regressand Y and the regressors Xj (j = 1, ..., k), the observable
variables.
1 It thus means that the error terms ui and us are statistically
independent, and hence have zero covariance, for any two observations
i and s.
Random sampling ⇒ Cov(ui , us |Xi , Xs ) = Cov(ui , us ) = 0 ∀i 6= s
2 It also means that the dependent variable values Yi and Ys are
statistically independent, and hence have zero covariance, for any two
observations i and s.
Random sampling ⇒ Cov(Yi , Ys |Xi , Xs ) = Cov(Yi , Ys ) = 0 ∀i 6= s
I A5 often is appropriate for cross-sectional regression models, but is
hardly ever appropriate for time-series regression models.

Introduction
Assumption 6 (A6 ): The number of sample observations N is greater

than the number of unknown parameters K:
I Unless this assumption is satisfied, it is not possible to compute from
a given sample of N observations, estimates of all the unknown
parameters in the model.

Introduction
Assumption 7 (A7 ): Non-constant Regressors

I The sample values Xj i of each regressor Xj (j = 1, . . . , k) in a given
sample (andhence in the population) are not all equal to a constant:
Xj i 6= cj ∀i = 1, ..., N where cj are constants (j = 1, ..., k)
I It implies that the sample variances of all k1 non-constant regressors
Xj (j = 1, ..., k) must be finite positive numbers for any sample size
N ; i.e. P
(Xj i −X̄)2
⇒ sample variance of Xj i = var(Xj i ) = i
N −1 >0
I each non-constant regressor Xj (j = 1, . . . , k) takes at least two

different values in any given sample
I to calculate the effect of changes in Xj on Y , the sample values Xj i
of the regressor Xj must vary across observations in any given sample.

Introduction
Assumption 8 (A8 ) : No perfect multicollinearity

I The sample values of the regressors Xj (j = 1, ..., k) in a multiple
regression model do not exhibit perfect or exact multicollinearity.
I The absence of perfect multicollinearity means that there exists no
exact linear relationship among the sample values of the non-constant
regressors Xj (j = 1, ..., k).
I An exact linear relationship exists among the sample values of the
nonconstant regressors if the sample values of the regressors
Xj (j = 1, ..., k) satisfy a linear relationship of the form:
λ0 + λ1 X1i + λ2 X2i + ... + λk Xki = 0 ∀i = 1, 2...., N
I Each non-constant regressor Xj (j = 1, ..., k) must exhibit some
independent linear variation in the sample data.
I Otherwise, it is not possible to estimate the separate linear effect of
each and every non-constant regressor on the regressand Y

Introduction
I Suppose that Yi = β0 + β1 X1i + β2 X2i + ui

I Further, suppose also that, X1i = 3X2i ;
Yi = β0 + β1 (3X2i ) + β2 X2i + ui
Yi = β0 + 3β1 X2i + β2 X2i + ui
Yi = β0 + (3β1 + β2 )X2i + ui
Yi = β0 + α2 X2i + ui where α2 = 3β1 + β2
I It is possible to estimate from the sample data the regression
coefficients β0 and α2 .
I But from the estimate of α2 it is not possible to compute estimates of
the coefficients β1 and β2 .
ü Reason: The equation α2 = 3β1 + β2 is one equation containing two
unknowns, namely β1 and β2 .
I Result: It is not possible to compute from the sample data estimates
of both β1 and β2 , the separate linear effects of X1i and X2i on the
regressand Yi .
The Problem of Estimation
3.2.Estimation: The Method of OLS

I We specify the k explanatory variable SRF as follows:
Yi = βˆ0 + βˆ1 X1i + βˆ2 X2i + ... + βˆk Xki + ei
or
Yi = Ŷi + ei
where Ŷi = βˆ0 + βˆ1 X1i + βˆ2 X2i + ... + βˆk Xki
I The goal is to find parameter estimates by minimizing the sum of
square errors, as was done with the simple regression model.
Pn 2 Pn
ü minimize i=1 ei ⇒ i=1 (Yi − Ŷi )2
⇒ minimize ni=1 (Yi − βˆ0 − βˆ1 X1i − βˆ2 X2i − ... − βˆk Xki )2 with respect
P
to βˆ0 and βˆj , j = 1, ..., k.

I Let us only consider with two explanatory variables, X1 & X2 :

Minimize n e2 ⇒ minimize n (Yi − βˆ0 − βˆ1 X1i − βˆ2 X2i )2
P P
i=1 i i=1
I We can do this by calculating the partial derivatives with respect to
the three unknown parameters βˆ0 , βˆ1 and βˆ2 , equating each to zero,
and solving.
I The normal equations then become:
nβˆ0 + βˆ1 ni=1 X1i + βˆ2 ni=1 X2i = ni=1 Yi
P P P
βˆ0 ni=1 X1i + βˆ1 ni=1 X1 2i + βˆ2 ni=1 X1i X2i = ni=1 X1i Yi
P P P P
βˆ0 ni=1 X2i + βˆ1 ni=1 X1i X2i + βˆ2 ni=1 X2 2i = ni=1 X2i Yi
P P P P
I Now, it can be easily solved using Cramer’s rule or matrix algebra to

find the formula for the parameter estimates.

I An alternative approach is to begin by expressing all the data in the

form of deviations from the sample means.
I Our three variable regression model is:
Yi = βˆ0 + βˆ1 X1i + βˆ2 X2i + ei
I Averaging over the sample observations gives
Ȳ = βˆ0 + βˆ1 X¯1i + βˆ2 X¯2i since ē = 0
I Now, subtracting the second equation from the first gives us the
deviation form:
yi = βˆ1 x1i + βˆ2 x2i + ei
I Note the intercept βˆ0 disappears from the deviation form of the
equation, but it may be recovered from:
βˆ0 = Ȳ − βˆ1 X¯ − βˆ2 X¯2i 1i

I For βˆ1 and βˆ2 , we can solve from the minimization of the deviation
form equation. So to minimize:
e2 ⇒ n (yi − yî )2 ⇒ n (yi − βˆ1 x1i − βˆ2 x2i )2
Pn P P
i=1 i i=1 i=1
I we needPto solve:
n Pn Pn
∂ e2
i=1 i
∂ (yi −yî )2 ∂ (yi −βˆ1 x1i −βˆ2 x2i )2
⇒ i=1
⇒ i=1
=0
∂ βˆ1 ∂ βˆ1 ∂ βˆ1
Pn Pn and P
n
∂ i=1 i
e2 ∂ (y −yî )2
i=1 i
∂ (y −βˆ1 x1i −βˆ2 x2i )2
i=1 i
⇒ ⇒ =0
∂ βˆ2 ∂ βˆ2 ˆ ∂ β2
I which gives respectively the following normal equations:
= βˆ1 ni=1 x1 2i + βˆ2 ni=1 x1i x2i
Pn P P
i=1 x1i yi
= βˆ1 ni=1 x1i x2i + βˆ2 ni=1 x2 2i

Pn P P
i=1 x2i yi

I We can reorganize these two normal equations in matrix form as

follows:
" P #" # "P #
n
x1i x2i βˆ1
Pn n
x 2 x1i yi
Pn i=1 1 i i=1 i=1
ˆ =
Pn 2 Pn
i=1 x1i x2i i=1 x2 i β2 i=1 x2i yi
I using Cramer’s rule, we have the following results:

Pn Pn Pn Pn
( x1i yi )( i=1 x2 2i )−( i=1 x2i yi )( i=1 x1i x2i )
βˆ1 = i=1 Pn Pn Pn
( i=1 x1 2i )( i=1 x2 2i )−( i=1 x1i x2i )2
and
Pn Pn Pn Pn
( x1 2i )( i=1 x2i yi )−( i=1 x1i x2i )( i=1 x1i yi )
βˆ2 = i=1 Pn Pn Pn
( i=1 x1 2i )( i=1 x2 2i )−( i=1 x1i x2i )2

The Variances of OLS estimators
X¯12
P 2 ¯2 P 2 P
x2 i +X2 x1 i −2X̄1 X̄ 2 x1i x2i 2
var(βˆ0 ) = [ n1 + P
x21 i
P P
x22 i −( x1i x2i )2
]σ
P
x22 i
var(βˆ1 ) = [ P x2 P x2 P
x1i x2i )2
]σ 2
1i 2 i −(
P
x21 i
var(βˆ2 ) = [ P x2 P x2 P
x1i x2i )2
]σ 2
1i 2 i −(
We estimate σ 2 with σˆ2 where

P
e2i
σˆ2 = n−k



Goodness of fit in MLR:
I From the SRE with K=2, we have:

Yi = βˆ0 + βˆ1 X1i + βˆ2 X2i + ei
I Its mean can be written as:
Ȳ = βˆ0 + βˆ1 X̄1 + βˆ2 X̄2
I subtracting this from the SRE gives us the deviation form of SRF as:
yi = βˆ1 x1i + βˆ2 x2i + ei
⇒ yi = yî + ei
I Deriving for error term or residual, we have:
ei = yi − βˆ1 x1i − βˆ2 x2i
I Its square means,
e2i = (yi − βˆ1 x1i − βˆ2 x2i )ei
e2i = yi ei − βˆ1 x1i ei − βˆ2 x2i ei
I summing over all, we get
ei = yi ei − βˆ1 x1i ei − βˆ2 x2i ei

P 2 P P P
P 2
⇒
P P
ei = yi ei since xki ei = 0 by assumption.
I Again, substituting ei = yi − βˆ1 x1i − βˆ2 x2i into this last equation, we
have
ei = yi (yi − βˆ1 x1i − βˆ2 x2i )
P 2 P
ei = yi2 − βˆ1 x1i yi − βˆ2 x2i yi

P 2 P P P
⇒ yi2 = βˆ1 x1i yi + βˆ2 x2i yi + e2i

P P P P
=⇒ T SS = ESS + RSS
yi = T SS; βˆ1 x1i yi + βˆ2 x2i yi = yî 2 = ESS; e2i = RSS
P 2 P P P P
I Then,
βˆ1 x1i yi +βˆ2
P 2 P P
ESS yî x2i yi
R2 = T SS = P
yi2
= P
yi2
or
P 2
e
R2 = 1 − RSS ⇒ 1 − P i2
T SS yi
I The coefficient of multiple determination (R2 ) measures the

proportion of the variation in the dependent variable explained by (the
set of all the regressors in) the model.
I R2 can be used to compare goodness-of-fit of alternative regression
equations, but only if the regression models satisfy two conditions.
1 The models must have the same dependent variable.
Reason: T SS, ESS&RSS depend on the units in which the
regressand (Y ) is measured. For instance, the T SS for Y is not the
same as the T SS for ln(Y ).
2 The models must have the same number of regressors & parameters
(same value of K + 1).
Reason: Adding a variable to a model never raises the RSS (or, never
lowers ESS or R2 ) even if the new variable is not very relevant.

Adjusted R2
I The adjusted R-squared,R̄2 , attaches a penalty to adding more

variables.
I It is modified to account for changes/ differences in degrees of
freedom (df): due to differences in number of regressors (k) and/ or
sample size (n).
I If adding a variable raises R̄2 , for a regression, then this is a better
indication that it has improved the model than if the addition merely
raises R2 .

P 2
e
P 2 P 2 i
ŷ ei
R2 = P y 2 = 1 − P y 2 = 1 − P
n−k
y2
i i
n−1
I Dividing TSS and RSS by their df and k+1 is the number of

parameters to be estimated.
P
e2 n−1
R̄2 = 1 − [ P yi2 ∗ n−k) ]
i
n−1
⇒ R̄2 = 1 − (1 − R2 ) n−k)
n−1
1 − R̄2 = (1 − R2 ) n−k)
I As long as k ≥ 1, 1 − R̄2 > (1 − R2 ) ⇒ R̄2 < R2
I In general, R̄2 ≤ R2 and as n grows larger relative to k, R̄2 −→ R2 .

Relationship between R̄2 and R2
1 While R2 is always non-negative, R̄2 can be positive or negative.

2 R̄2 can be used to compare goodness-of-fit of two/more regression
models only if the models have the same regressand.
3 Including more regressors reduces both RSS & df ; R̄2 rises only if
the first effect dominates.
4 R̄2 or R2 should never be the sole criterion for choosing
between/among models. In addition to R̄2 , one should also:
consider expected signs & values of coefficients, and
look for consistency with economic theory or reasoning (possible
explanations).

Coefficient of Partial Determination(r2 )

I When more than one regressors are included in the regression
equations, we may interested in how much variation in the regressand
a given regressor explains after controlling for the other.
I in case of k = 2, we compute the coefficient of partial determination
as follows:
Ry2 .12 −Ry2 .1
ry22.1 = 1−Ry2 .1
and
Ry2 .12 −Ry2 .2
ry21.2 = 1−Ry2 .2
I Ry2 .1 and Ry2 .2 are coefficients of determinations from SLR of X1 and

X2 on Y respectively, Ry2 .12 is multiple coefficient of determination.
I The inclusion of X2 increase the explanatory power of the model by
(Ry2 .12 − Ry2 .1 ) yi2 .
P

Example:
I From the earlier example we have the following information:
n = 10; TSS = yi2 = 3450; ESS = yî 2 = 3085.78;
P P
RSS= e2i = 364.22; βˆ0 = 111.692; βˆ1 = −7.19 & βˆ2 = 0.0143
P
1 Find multiple coefficient of determination (R2 )

2 how much variation in Y is explained by X1 alone?
3 what is contribution of the inclusion of X2 to the explanatory power of
the model?
4 Even after X2 is included, how much variation of Y is left unexplained?
5 Find R̄2
6 Find ry22.1

Inferences
Statistical Inferences in Multiple Linear Regression

The Normality assumptions:
I The error term, ei and estimators, (βˆ0 ) and βˆj are assumed to be
normally distributed
ei ∼ N (0, σ 2 ); ⇒ Yi ∼ N (α + βXi , σ 2 )
ˆ
β0 P ∼ N (β0 , var( ˆ
P 2β0 );
X¯12 x22 i +X¯22
P
var(βˆ0 ) = [ n1 + P 2 P 2x1 i −2 PX̄1 X̄ 2 2 x1i x2i ]σ 2
x1 i x2 i −( x1i x2i )
P
x22 i
βˆ1 ∼ N (β1 , var(βˆ1 ); var(βˆ1 ) = [ P x2 P x2 −(
P
x1i x2i )2
]σ 2
1i 2i
P
x21 i
βˆ2 ∼ N (β2 , var(βˆ2 ); var(βˆ1 ) = [ P x2 P x2 P
x1i x2i )2
]σ 2
1i 2 i −(
We estimate σ2 with σ̂ 2 where P

e2i
σ̂ 2 = n−k

Inferences
Confidence Intervals
I Given the level of significance (type I error) denoted by α, the
confidence interval for β0 and βj are given as follows:
100(1 − α)% two sided CI for β0 : βˆ0 ± (tn−k−1
α )se(βˆ0 )
2
100(1 − α)% two sided CI for βj : βˆj ± (tn−k−1

α )se(βˆj )
2
where j = 1, 2, ..., k
I the conventional significance level is 5%. Thus, the 95% CI for β0
and βj are given as follows:
95% two sided CI for β0 : βˆ0 ± (tn−k−1 )se(βˆ0 ) 0.025
95% two sided CI for βj : βˆj ± (t0.025
n−k−1
)se(βˆj )
where j = 1, 2, ..., k

Inferences
Example: Based on the earlier figures

I Given that βˆ0 = 111.692; βˆ1 = −7.19 & βˆ2 = 0.0143; n = 10
var(βˆ0 ) = 484.487; var (βˆ1 ) = 5.71; var(βˆ2 ) = 0.00011
se(βˆ0 ) = 22.011; se(βˆ1 ) = 2.39; se(βˆ2 ) = 0.0104
I Construct 95% CI for β0 , β1 and β2

Inferences
Hypothesis Testing
I To test the statistical relationship between economic variables in
MLR, we use two types of tests:
1 the t-test: to test individual coefficients
2 the F-test: to test more than one coefficients (to conduct joint tests)
I the t test is conducted using
βˆj −βj
t= s.e.(βj )
where βˆj is estimated value and βj is hypothesized value.

I In our earlier example test the claim that β1 and β2 are zero
⇒ H0 : β1 = 0 vs HA : β1 6= 0 and H0 : β2 = 0 vs HA : β2 6= 0

Inferences
The F-test
I The F-statistic is used to test the joint significance of all the slope
coefficients in a multiple linear regression model.
I If the unrestricted PRE is given as:
Yi = β0 + β1 X1i + β2 X2i + ... + βk Xki + ui
I The null and alternative hypothesis are:
H0 : βj = 0 for all j = 1, ..., k
HA : βj 6= 0 for all j = 1, ..., k
I The null hypothesis H0 says that all slope coefficients are jointly
equal to zero.
I The alternative hypothesis HA says that some or all of the slope
coefficients are not equal to zero.

Inferences
I The F-statistic for joint significance test is computed as the ratio of

the M SS (mean sum-of-squares) for the sample regression function
to the M SS for the residuals:
PN
ESS/(K−1) ŷ 2 /K−1
i=1 i
⇒ F0 = RSS/(N −K) = PN ˆ i /N −K
u2 i=1
I The distribution of F0 or (F-computed) under the null hypothesis

H0 : βj = 0 for all j = 1, . . . , k – is the F [K − 1, N − K] distribution:
ESS/(K−1)
F0 = RSS/(N −K) ∼ F [K − 1, N − K]
I Alternatively, F0 is computed as follows divide the numerator and
denominator of F0 by T SS:
ESS/(K−1) ESS/T SS/(K−1) R2 /(K−1)
F0 = RSS/(N −K) = F0 = RSS/T SS/(N −K) = F0 = (1−R2 )/(N −K)
since R2 = ESS/T SS and 1 − R2 = RSS/T SS
ESS/(K−1) R2 /(K−1)
Thus, F0 = RSS/(N −K) = (1−R2 )/(N −K)

Inferences
Decision Rule:
I Retain H0 at significance level α if F0 ≤ Fα [K − 1, N − K].
I Reject H0 at significance level α if F0 > Fα [K − 1, N − K].
or
ü Retain H0 at significance level α if p-value of F0 ≥ α
ü Reject H0 at significance level α if p-value F0 < α.
Example: Test the joint significance of X1 and X2 in the earlier examples:
⇒ H0 : β1 = β2 = 0 vs HA : β1 = β2 6= 0

Inferences
************* End of Chapter Three *************

Chapter 3

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 3

Uploaded by

Copyright:

Available Formats

Addis Ababa University,

Multiple Linear Regression

Introduction to Econometrics Addis Ababa University, School of Commerce December 8, 2023 2 / 43

I Simple linear regression (SLR) is mostly unrealistic since the

Introduction to Econometrics Addis Ababa University, School of Commerce December 8, 2023 3 / 43

I What changes as we move from simple to multiple regression?

Introduction to Econometrics Addis Ababa University, School of Commerce December 8, 2023 4 / 43

Interpreting slope coefficients:

Introduction to Econometrics Addis Ababa University, School of Commerce December 8, 2023 5 / 43

I For example, for j = 1;

I Thus, β1 =the partial marginal effect of X1 on the conditional mean

Introduction to Econometrics Addis Ababa University, School of Commerce December 8, 2023 6 / 43

Assumptions of Classical Linear Regression Model (CLRM)

Assumption 1 (A1 ):The population regression equation (PRE) takes the

A1 further incorporates three distinct assumptions.

Introduction to Econometrics Addis Ababa University, School of Commerce December 8, 2023 7 / 43

Assumption 2 (A2 ): The Assumption of Zero Conditional Mean Error

E(ui |X1i , X2i , ..., Xki ) = 0

Introduction to Econometrics Addis Ababa University, School of Commerce December 8, 2023 8 / 43

The equality of cov(Xj i , ui ) = E(Xj i ui ) = 0 can be shown as follows:

Introduction to Econometrics Addis Ababa University, School of Commerce December 8, 2023 9 / 43

Violation of zero covariance

Introduction to Econometrics Addis Ababa University, School of Commerce December 8, 2023 10 / 43

Assumption 3 (A3 ) : Constant Error Variances/Homoskedastic Errors

I The conditional variances of the random error terms ui are identical

Introduction to Econometrics Addis Ababa University, School of Commerce December 8, 2023 11 / 43

I This can be shown as follows:

E(Yi |Xj i ) = β0 − kj=1 βj Xki

=E(u2i |Xj i ) since ui = Yi − β0 − kj=1 βj Xki

= σ 2 since E(u2i |Xj i ) = σ 2 by A3

Introduction to Econometrics Addis Ababa University, School of Commerce December 8, 2023 12 / 43

Assumption 4 (A4 ) : Zero Error Covariances

Introduction to Econometrics Addis Ababa University, School of Commerce December 8, 2023 13 / 43

I This can easily shown as follows:

Introduction to Econometrics Addis Ababa University, School of Commerce December 8, 2023 14 / 43

Assumption 5 (A5 ): Random Sampling or Independent Random Sam-

Introduction to Econometrics Addis Ababa University, School of Commerce December 8, 2023 15 / 43

Assumption 6 (A6 ): The number of sample observations N is greater

Introduction to Econometrics Addis Ababa University, School of Commerce December 8, 2023 16 / 43

Assumption 7 (A7 ): Non-constant Regressors

I each non-constant regressor Xj (j = 1, . . . , k) takes at least two

Introduction to Econometrics Addis Ababa University, School of Commerce December 8, 2023 17 / 43

Assumption 8 (A8 ) : No perfect multicollinearity

Introduction to Econometrics Addis Ababa University, School of Commerce December 8, 2023 18 / 43

I Suppose that Yi = β0 + β1 X1i + β2 X2i + ui

3.2.Estimation: The Method of OLS

to βˆ0 and βˆj , j = 1, ..., k.

Introduction to Econometrics Addis Ababa University, School of Commerce December 8, 2023 20 / 43

I Let us only consider with two explanatory variables, X1 & X2 :

I Now, it can be easily solved using Cramer’s rule or matrix algebra to

Introduction to Econometrics Addis Ababa University, School of Commerce December 8, 2023 21 / 43

I An alternative approach is to begin by expressing all the data in the

Introduction to Econometrics Addis Ababa University, School of Commerce December 8, 2023 22 / 43

= βˆ1 ni=1 x1i x2i + βˆ2 ni=1 x2 2i

Introduction to Econometrics Addis Ababa University, School of Commerce December 8, 2023 23 / 43

I We can reorganize these two normal equations in matrix form as

I using Cramer’s rule, we have the following results:

Introduction to Econometrics Addis Ababa University, School of Commerce December 8, 2023 24 / 43

The Variances of OLS estimators

We estimate σ 2 with σˆ2 where

Introduction to Econometrics Addis Ababa University, School of Commerce December 8, 2023 25 / 43

Introduction to Econometrics Addis Ababa University, School of Commerce December 8, 2023 26 / 43

* End of Chapter Three *