Chapter 4 Multiple Regression Model
Chapter 4 Multiple Regression Model
Tasew T.(PhD)
3.1 Introduction
Tasew T.(PhD)
Intro…
Previously, the bivariate regression equation was:
Tasew T.(PhD)
3.2 Model Assumptions
• Dependent variable: Y of size nx1 • No multicollinearity: No
• Independent (explanatory) exact linear relationship exists
variables: X2 , X3 , . . ., Xk each of
between any of the
Y = 1 + 2 X2 + 3 X3 +…+ k Xk + u
size nx1.
explanatory variables.
Assumptions:
Note: all the assumptions we
made in chapter two wroks • Or minimal multicollinearity
here among independent variables.
The only additional assumption
here is that there is no
multicollinearity, meaning that
there is no linear dependence
between the regressor Tasew T.(PhD)
Assumption of no multicollinearity:
• Multicollinearity means the existence of a “perfect,” or “exact”,
linear relationship among some or all explanatory variables
included in the model.
Tasew T.(PhD)
3.3 OLS Estimation of Parameters
• The k — variable population regression function involving the
dependent variable Y and k-1 explanatory variables x2, x3, ...,xk
may be written as:
• Yi = 1 + 2X2i+ 2X3i + ….+ kXki + U I
• The Three-variable Linear Model:
• Here we will consider a model with two explanatory variables
• Yi = 1 + 2X2i+ 3X3i + Ui , where
• Y is the dependent variable, X2 and X3 are the explanatory
variables (or regressors),
• U the stochastic disturbance term, and i is the ith observation;
in case the data are time series, the subscript i will denote the
ith observation.
• 1 is the intercept term. As usual, it gives the mean or average
effect on Y of all the variables excluded from the model,
although its mechanical interpretation
Tasew T.(PhD) is the average value of Y
Estimation of parameters...
• The coefficients 2 and 3 are called the partial regression/slope
coefficients.
• The partial regression coefficients are interpreted as
representing the partial effect of the given explanatory variable
on the explained variable, after holding constant, or eliminating
the effect of, all other explanatory variables.
• In other words, each coefficient measures the average change in
the dependent variable per unit change in a given ind’t
variable, holding all other ind’t variables constant at their
average values.
• For example, 2 ˆmeasures the effect of x2 on y after eliminating
the effects of x3.
– it gives the “direct” or the “net” effect of a unit change in X2 on the mean
value of Y, net of any effect that X3 may have on mean Y.
• Likewise, β3 measures the change in the mean value of Y per
unit change in X3, holding the value of X2 constant.
Tasew T.(PhD)
Cont’d.
Tasew T.(PhD)
Derive OLS estimators of multiple regression
^ ^ ^
Y = 1 + 2X2 + 3X3 + u^
^u = Y - ^ - ^ X - ^ X
1 2 2 3 3
^ ^ ^ ^
min. RSS = min. u = min. (Y - 1 - 2X2 - 3X3)2
2
RSS
^ ^ ^
1^ =2 ( Y - 1- 2X2 - 3X3)(-1) = 0
RSS
^ ^ ^
2^ =2 ( Y - 1- 2X2 - 3X3)(-X2) = 0
RSS
^^ ^
3^ =2 ( Y - 1- 2X2 - 3X3)(-X3) = 0
Tasew T.(PhD)
Rearranging three equations:
^
^ ^
1X2 + 2 X2 + 3 X2X3 = X2Y
2
(2)
^ ^ ^
1 X3 + 2 X2X3 + 3 X32 = X3Y (3)
^
(X’X)
= X’Y Matrix notation
Tasew T.(PhD)
OLS...
• Following the convention of letting the
lowercase letters denote deviations from
sample mean values, one can derive the
following formulas from the normal
equations (2) and (3)
rewrite in deviation form:
^
X2 X2X3
2
2 X2Y
^ =
X2X3 X32 3 X3Y
Tasew T.(PhD)
Cramer’s rule:
X2Y X2X3
X3Y X32
^ (yx2)(x32) - (yx3)(x2x3)
2 = =
(x22)(x32) - (x2x3)2
X X2X3
2
2
X2X3 X32
X22 X2Y
X2X3 X3Y
(yx3)(x22) - (yx2)(x2x3)
^
3 = =
X22 X2X3
(x22)(x32) - (x2x3)2
X2X3 X32
_ _ _
^
^1 = Y - 2X2 - 3X3 ^
Tasew T.(PhD)
Tasew T.(PhD)
Example: House Price Results
Dependent Variable: Canadian Dollars per Month
Tasew T.(PhD)
Example 1: Estimate the regression parameters using the
following data
X2 X3
X2 X3 yX2 yX3 X2 X3 X2 2 X3 2
X2 X3
Tasew T.(PhD)
Example…
•
Interpret the partial slope coefficients
• = where X2 and X3 are fertilizer(Kg) and
Tasew T.(PhD) pesticide used for corn production(Y)
3.4 Variances and standard errors of
estimators
• We need the standard errors for two main
purposes:
– to establish confidence intervals and
– to test statistical hypotheses.
• The relevant formulas are as follows:
Tasew T.(PhD)
The variances of estimated regression coefficients are
estimated respectively as follows
Unbiased estimates of the
variance are given by
• where r23 is the coefficient
of correlation between X 2
computed as
• Taking the square roots, we
obtain the standard errors
of the estimated regression
coefficients.
• An unbiased estimator of
the^variance of the errors δ2
is: δ2= ESS/n-3
Tasew T.(PhD)
Example: statistics to compute residual sum of squares.
To compute the variances and standard errors of estimates, we need
to estimate the error variance:
Tasew T.(PhD)
Alternatively we can use this short-hand formula,
Example: Variances and standard errors of
estimates
^
Var(2)= Se=0.2449
^
Var( Se =0.2645
3 )=
Residual
=1.952
variance
Tasew T.(PhD)
3.5 Hypothesis Testing and Inferences
• Hypothesis testing about
– 1. significance of individual
partial regression coefficients
– 2. Overall significance of the The test statistic(t-cal.) is
regression parameters
• Hypothesis testing about
individual partial regression • Decision rule:
coefficient can be conducted
using the t—statistic as usual. • If | t j|> t /2, (n-3) df, we
• To test whether each of the reject Ho and conclude that
coefficients are significant or not,
the null and alternative j is significant, that is, the
hypotheses are given. regressor variable Xj , j = 2, 3,
– Ho: j= 0 …k, significantly affects the
^2 - 0 0.65
t2 = = = 2.7
Se (2)^ 0.24
Answer : Yes, 2 is^ statistically significant and is significantly different from zero.
From our example, we reject the null hypothesis and conclude that applying
fertilizer significantly affects corn production at the 5% level of significance.
Tasew T.(PhD)
1. Individual partial coefficient test (cont.)
2
Holding X2 constant, whether X3 has the effect on Y?
Does pesticide significantly affect corn productivity?
Y
H0 : 3 = 0 = 3 = 0?
X3
H1 : 3 0
^
3 - 0 1.11
t3 = = = 4.11
Se (3)^ 0.264
^
Answer: Yes, 3 is statistically significant and is significantly different from
zero.
We reject the null hypothesis and conclude that using pesticide significantly
affects per corn production at the 5%
Tasew level of significance.
T.(PhD)
2. Testing overall significance of the multiple regression: Test of
model adequacy
• In multiple regressions, one can test for overall significance of a
sample regression by using the F-statistic.
• Tests the overall significance of the regression parameters.
• A test for the significance of R 2 or a test of model adequacy is
accomplished by testing the hypotheses:
– Ho: 2 = 3 = 4=---= k=0(all variable are zero effect)
– Ha : Ho is not true(at least one of the coefficients is non-zero)
• The test statistic is given by:
– Fcal=( RSS/k-1)/(ESS/n-k), RSS(regression SS), ESS(Error SS)
• In our case, Fcal=(RSS/3-1)/(ESS/n-3), where k is the number of
parameters estimated from the sample data (k = 3 in our case since
we estimate 1 , 2, and 3 ) and n is the sample size.
• The model is adequate in explaining the r/ship between the
dependent variable and one or more of the independent variables if:
Fcal > F (k-1, n-k) Tasew T.(PhD)
2. Testing overall significance of the multiple regression
Residual SS=
Tasew T.(PhD)
Adjusted R 2
• One of the drawbacks of the R-squared is that it is often a non-
decreasing function of the number of regressors.
– As the number of explanatory (independent) variables increases, R 2 may
increase.
– This implies that the goodness-of-fit of an estimated model depends on
the number of independent (explanatory) variables regardless of
whether they are important or not.
Tasew T.(PhD)