CHAPTER THREE - Multiple Linear Regression Analysis
CHAPTER THREE - Multiple Linear Regression Analysis
Multiple Linear
Regression Analysis
We Focus on Model with Two Explanatory Variables
In simple regression we study the relationship between a
dependent variable and a single explanatory (independent
variable); assume that a dependent variable is influenced by
only one explanatory variable.
However, many economic variables are influenced by several
factors or variables.
For instance, in decision to investment studies we study the
relationship between quantity invested (or either to invest or
not) and interest rate, share price , exchange rate, etc..
Multiple regression analysis is an extension of simple
regression analysis to cover cases in which the dependent
variable is hypothesized to depend on more than one
explanatory variable.
Much of the analysis will be a straightforward extension of the
simple regression model, but we will encounter two new
problems.
First, when evaluating the influence of a given explanatory
variable on the dependent variable, we now have to face the
problem of discriminating between its effects and the effects
of the other explanatory variables.
Second, we shall have to tackle the problem of model
specification.
Adding more variables to the simple linear regression model
leads us to the discussion of multiple regression models i.e.
models in which the dependent variable (or regressand)
depends on two or more explanatory variables, or regressors.
The multiple linear regression (population regression function)
in which we have one dependent variable Y, and k explanatory
variables, is given by
• Definition of the multiple linear regression model
Dependent variable,
explained variable, Error term,
Independent variables, disturbance,
response variable,…
explanatory variables, unobservables,…
regressors,…
• Motivation for multiple regression
– Incorporate more explanatory factors into the model
– Explicitly hold fixed other factors that otherwise would be in
– Allow for more flexible functional forms
• Example: Wage equation
Other factors
– Holding ACT fixed, another point on high school grade point average
is associated with another .453 points college grade point average
– Or: If we compare two students with the same ACT, but the hsGPA
of student A is one point higher, we predict student A to have a
colGPA that is .453 higher than that of student B
– Holding high school grade point average fixed, another 10 points on
ACT are associated with less than one point on college GPA
Assumptions of the Multiple Linear Regression
In order to specify our multiple linear regression model
and proceed our analysis with regard to this model, some
assumptions are compulsory.
But these assumptions are the same as in the single
explanatory variable model developed earlier except the
assumption of no perfect multicollinearity.
These assumptions are:
Assumptions
1. The Linearity Assumption
2. Xi’s and the residuals are uncorrelated.
E i / X 2, X 3, X k 0 , i, i 1,2,3, , n E Y X
Prof.VuThieu
Estimation of Partial Regression Coefficients
ei2
2 Yi ˆ0 ˆ1 X 1i ˆ2 X 2i 0 …………………………….………… (5)
ˆ 0
ei2
2 X 1i Yi ˆ0 ˆ1 X 1i ˆ1 X 1i 0 …………………………….………… (6)
ˆ 1
ei2
2 X 2i Yi ˆ0 ˆ1 X 1i ˆ2 X 2i 0
ˆ
2
X Y ˆ X ˆ X ˆ X X
1i i 0 1i 1
2
1i 2 1i 1i …………………………….………… (8)
X Y ˆ X ˆ X X ˆ X
2i i 0 2i 1 1i 2i 2
2
2i …………………………….………… (9)
………………………….………… (10)
ˆ0 Y ˆ1 X 1 ˆ 2 X 2
X Y (Y ˆ1 X 1 ˆ2 X 2 )X 1i ˆ1X 1i ˆ2 X 2i
2
1i i
X Y YˆX 1i ˆ1 (X 1i X 1X 2i ) ˆ 2 (X 1i X 2i X 2 X 2i )
2
1i i
X Y nY X ˆ1 (X 1i nX 1i ) ˆ2 (X 1i X 2 nX 1 X 2 ) ………………………….………… (11)
2 2
1i i 1i
We know that
X Yi (X i Yi nX i Yi ) xi yi
2
i
X i X i (X i nX i ) xi
2 2 2 2
Substituting the above equations in equation (10), the normal equation (8) can be
written in deviation form as follows:
1 1 1 2 12
x y ˆ x 2 ˆ x x
………………………….………… (12)
2 1 12 2 2
x y ˆ x x ˆ x 2
………………………….………… (13)
Let‘s bring (12) and (13) together
x y ˆ x ˆ 2 x1 x2
2
1 1 1
………………………….………… (14)
x y ˆ x x
2 1 1 2 ˆ x 2
2 2 ………………………….………… (15)
……….………… (16)
x1 y . x 2 x1 x 2 . x 2 y
2
ˆ 2 ………………………….………… (18)
x1 . x 2 ( x1 x 2 )
2 2 2
We can also express ˆ1 and ˆ2 in terms of covariance and variances Y , X 1 and X 2
Let C= ( X X ) 1 X
̂ CY
• We would like some measure of how well our regression model actually fits
the data.
• We have goodness of fit statistics to test this: i.e. how well the sample
regression function (srf) fits the data.
• The most common goodness of fit statistic is known as R2. One way to define
R2 is to say that it is the square of the correlation coefficient between y and .
• For another explanation, recall that what we are interested in doing is
y$
explaining the variability of y about its mean value, , i.e. the total sum of
squares, TSS: y
TSS yt y
2
ty y 2
tˆ
y y 2
t
ˆ
u 2
t t t
• Our goodness of fit statistic is
ESS
R2
TSS
• But since TSS = ESS + RSS, we can also write
yt
yt
xt xt
Problems with R2 as a Goodness of Fit Measure
2 2 2
𝑦 (𝑌 − 𝑌 ) 𝑢𝑖
𝑅2 = 2 = = 1 −
𝑦 (𝑌 − 𝑌)2 𝑦2
𝛽1 𝑥1𝑖𝑦 + 𝛽2 𝑥2𝑖 𝑦𝑖
𝑅2 = 𝑖
𝑦𝑖2
The value of R2 lies between 0 and 1. The higher R2 the greater the
percentage of the variation of Y explained by the regression plane, that
is, the better the ‗goodness of fit‘ of the regression plane to the sample
observations. The closer R2 to zero, the worse the fit
Example : Determinants of College GPA
n = 141, 𝑅2 = 0.176.
This means that hsGPA and ACT together explain about 17.6% of the
variation in college GPA for this sample of students.
This may not seem like a high percentage, but we must remember that there
are many other factors—including family background, personality, quality of
high school education, affinity for college—that contribute to a student‘s
college performance.
If hsGPA and ACT explained almost all of the variation in colGPA, then
performance in college would be preordained by high school performance!
An important fact about 𝑅2 is that it never decreases, and it
usually increases, when another independent variable is
added to a regression and the same set of observations is
used for both regressions.
An important caveat to the previous assertion about R-
squared is that it assumes we do not have missing data on the
explanatory variables.
If two regressions use different sets of observations, then, in
general, we cannot tell how the R-squareds will compare,
even if one regression uses a subset of regressors.
Adjusted R2
independently of
It follows that:
Multiple Regression
Analysis: Inference
• Discussion of the normality assumption (cont.)
– Examples where normality cannot hold:
• Wages (nonnegative; also: minimum wage)
• Number of arrests (takes on a small number of integer values)
• Unemployment (indicator variable, takes on only 1 or 0)
– In some cases, normality can be achieved through transformations of
the dependent variable (e.g. use log(wage) instead of wage)
– Under normality, OLS is the best (even nonlinear) unbiased estimator
– Important: For the purposes of statistical inference, the assumption of
normality can be replaced by a large sample size
Multiple Regression
Analysis: Inference
• Testing hypotheses about a single population parameter
• Theorem 3.1 (t-distribution for standardized estimators)
Note: The t-distribution is close to the standard normal distribution if n-k-1 is large.
• Goal: Define a rejection rule so that, if it is true, H0 is rejected only with a small
probability (= significance level, e.g. 5%)
Multiple Regression
Analysis: Inference
• Testing against one-sided alternatives (greater than zero)
Standard errors
Test . against
One would either expect a positive effect of experience on hourly wage or no effect at all.
Multiple Regression
Analysis: Inference
• Testing against one-sided alternatives (less than zero)
Test against .
Percentage of students Average annual tea- Staff per one thou- School enrollment
passing maths test cher compensation sand students (= school size)
Test against .
Degrees of freedom;
here the standard normal
approximation applies
One cannot reject the hypothesis that there is no effect of school size on
student performance (not even for a lax significance level of 15%).
Multiple Regression
Analysis: Inference
• Example: Student performance and school size (cont.)
– Alternative specification of functional form:
Test against .
Multiple Regression
Analysis: Inference
• Example: Student performance and school size (cont.)
t-statistic
(small effect)
Multiple Regression
Analysis: Inference
• Testing against two-sided alternatives
Test against .
– If a variable is statistically insignificant at the usual levels (10%, 5%, 1%), one may
think of dropping it from the regression
– If the sample size is small, effects might be imprecisely estimated so that the case for
dropping insignificant variables is less strong
Multiple Regression
Analysis: Inference
• Computing p-values for t-tests
– If the significance level is made smaller and smaller, there will be a point
where the null hypothesis cannot be rejected anymore
– The reason is that, by lowering the significance level, one wants to avoid more
and more to make the error of rejecting a correct H0
– The smallest significance level at which the null hypothesis is still rejected, is
called the p-value of the hypothesis test
– A small p-value is evidence against the null hypothesis because one would
reject the null hypothesis even at small significance levels
– A large p-value is evidence in favor of the null hypothesis
– P-values are more informative than tests at fixed significance levels
Multiple Regression
Analysis: Inference
• How the p-value is computed (here: two-sided test)
These would be the In the two-sided case, the p-value is thus the
critical values for a 5%
significance level probability that the t-distributed variable takes
on a larger absolute value than the realized value
of the test statistic, e.g.:
reject in favor of
Multiple Regression
Analysis: Inference
• Example: Model of firms‘ R&D expenditures
The effect of sales on R&D is relatively precisely estimated as This effect is imprecisely estimated as the in-
the interval is narrow. Moreover, the effect is significantly terval is very wide. It is not even statistically
different from zero because zero is outside the interval. significant because zero lies in the interval.
Global hypothesis test (F and r2)
• We used the t test to test single hypotheses ,i.e. Hypotheses
involving only one coefficient. But what if we want to test
more than one Coefficients simultaneously? We do using F-
test.
Now, we want to test whether a group of variables has no effect on the
dependent variable.
More precisely, the null hypothesis is that a set of variables has no
effect on y, once another set of variables has been controlled.
• F-test it is used for to test overall significance of a model.
𝑀𝑆𝑆 𝑜𝑓 𝐸𝑆𝑆 𝐸𝑆𝑆 (𝑘−1)
𝐹= ⟹
𝑀𝑆𝑆 𝑜𝑓 𝑅𝑆𝑆 𝑅𝑆𝑆 (𝑛−𝑘)
𝑅2 (𝑘−1)
Or using R-squared 𝐹 =
1−𝑅2 (𝑛−𝑘)
.
Decision: If F > F α(k−1,n−k) , reject H0; otherwise you may
accept H0.(Fcal > F-tab).
where F α(k−1,n−k) is the critical F value at the α
level of significance and (k − 1) numerator df and (n −
k) denominator df
• We used the t-test to test single hypotheses, i.e. hypotheses involving only
one coefficient. But what if we want to test more than one coefficient
simultaneously?
• The unrestricted regression is the one in which the coefficients are freely
determined by the data, as we have done before.
• The restricted regression is the one in which the coefficients are restricted,
i.e. the restrictions are imposed on some s.
The F-test:
Restricted and Unrestricted Regressions
• Example
The general regression is
yt = 1 + 2x2t + 3x3t + 4x4t + ut (1)
• We want to test the restriction that 3+4 = 1 (we have some hypothesis
from theory which suggests that this would be an interesting hypothesis to
study). The unrestricted regression is (1) above, but what is the restricted
regression?
yt = 1 + 2x2t + 3x3t + 4x4t + ut s.t. 3+4 = 1
e Y Y yi
2
RRSS 2
0i i
2
e Y Yˆ
2
URSS 2
i i i
Y Y Yˆ Y
2 2
i i
i ˆyi ˆyi
y
2
y y e ˆ x ˆ x ˆ x as ˆy y e
2
i i i 2 2i 3 3i k ki i i i
• We know that
URSS
e 2
i
~ 2
2
2 nk
2 2 k 1
y
i 1
2
i e ̂
i 1
2
i
j 2 i
x ji yi j
TSS RSS ESS
2 2 2
n 1 nk k 1
Y 1 2 X 2 3 X 3 4 X 4
Where: Y = ln of consumption
X2 = ln of income of consumers
X3 = ln of prices
X4 = time
• The number of years over which these data were obtained were 17, thus
n = 17, and the
• Examples :
H0: hypothesis No. of restrictions, m
1 + 2 = 2 1
2 = 1 and 3 = -1 2
2 = 0, 3 = 0 and 4 = 0 3
• If the model is yt = 1 + 2x2t + 3x3t + 4x4t + ut,
then the null hypothesis
H0: 2 = 0, and 3 = 0 and 4 = 0 is tested by the regression F-statistic. It
tests the null hypothesis that all of the coefficients except the intercept
coefficient are zero.
• Note the form of the alternative hypothesis for all tests when more than one
restriction is involved: H1: 2 0, or 3 0 or 4 0
The Relationship between the t and the F-
Distributions
• Any hypothesis which could be tested with a t-test could have been
tested using an F-test, but not the other way around.
Batting average Home runs per year Runs batted in per year
against
• Test statistic
cannot be rejected
Multiple Regression
Analysis: Inference
• Regression output for the unrestricted regression