Week 4 - The Multiple Linear Regression Model (Part 1) PDF
Week 4 - The Multiple Linear Regression Model (Part 1) PDF
1
Generalising the simple model to
multiple linear regression model (MLRM)
• Now we write
yt 1 2 x2t 3 x3t ... k xkt ut , t = 1,2,...,T
E.g., ̂ 2 measures the effect of x2 on y after eliminating the effects of x3, x4,
. . . , xk .
• Now we write
yt 1 2 x2t 3 x3t ... k xkt ut , t = 1,2,...,T
• Where is x1? It is the constant term. In fact the constant term is usually
represented by a column of ones of length T:
1
1
x1
1
• There is a variable implicitly hiding next to β1, which is a column vector of
ones, the length of which is the number of observations in the sample.
where y is T 1
X is T k
is k 1
u is T 1
Inside the matrices of the MLRM
y1 1 x21 u1
y 1 x u
2 22 2
1
2
yT 1 x2T uT
Determinant of A
Numerical example
ˆ1
ˆ ˆ 2
( X X ) 1 X y
ˆ k
Y = -2.68 + 9.500X
Calculating the standard errors for the multiple regression
model
• We used the t-test to test single hypotheses, i.e., hypotheses involving only one
coefficient. But what if we want to test more than one coefficient
simultaneously?
• The unrestricted regression is the one in which the coefficients are freely
determined by the data, as we have done before.
• The restricted regression is the one in which the coefficients are restricted, i.e.,
the restrictions are imposed on some s.
Calculating the F-test statistic
• The RSS from each regression are determined, and the two residual sums of squares are
‘compared’ in the test statistic. The F-test statistic for testing multiple hypotheses about
the coefficient estimates is given by:
RRSS URSS T k
test statistic
URSS m
where URSS = RSS from unrestricted regression
RRSS = RSS from restricted regression
m = number of restrictions
T = number of observations
k = number of regressors in unrestricted regression including a
constant in the unrestricted regression (or the total number of
parameters to be estimated).
• If the residual sum of squares increased considerably after the restrictions were imposed,
the restrictions were not supported by the data, and therefore that the hypothesis should
be rejected, and vice versa.
The F-test:
Restricted and unrestricted regressions
• Example
The general regression is:
yt = 1 + 2x2t + 3x3t + 4x4t + ut (1)
• We want to test the restriction that 3+4 = 1 (we have some hypothesis
from theory which suggests that this would be an interesting hypothesis to
study). The unrestricted regression is (1) above, but what is the restricted
regression?
yt = 1 + 2x2t + 3x3t + 4x4t + ut s.t. 3+4 = 1 (2)
• Any variables without coefficients attached (e.g., x4 in (5)) are taken over to
the LHS and are then combined with y. This is the restricted regression. We
actually estimate it by creating two new variables, call them, say, Pt and Qt.
Pt = yt - x4t
Qt = x3t - x4t
so
• The test statistic follows the F-distribution under the null hypothesis.
• The F-distribution has 2 degrees of freedom parameters (recall that the t-distribution had
only 1 degree of freedom parameter, equal to T − k).
• The value of the degrees of freedom parameters for the F-test are m, the number of
restrictions imposed on the model, and (T − k), the number of observations less the number
of regressors for the unrestricted regression, respectively.
• The appropriate critical value will be in column m, row (T − k) of the F-distribution tables.
• We therefore only reject the null if the test statistic > critical F-value.
The relationship between the t and the F-distributions
• Any hypothesis which could be tested with a t-test could have been
tested using an F-test, but not the other way around.
• Examples :
H0: hypothesis No. of restrictions, m
1 + 2 = 2 1
2 = 1 and 3 = -1 2
2 = 0, 3 = 0 and 4 = 0 3
• Note the form of the alternative hypothesis for all tests when more than one restriction is
involved: H1: 2 0, or 3 0 or 4 0
• ‘and’ occurs under the null hypothesis and ‘or’ under the alternative, so that it takes only one
part of a joint null hypothesis to be wrong for the null hypothesis as a whole to be rejected.
What we cannot test with either an F or a t-test
• We cannot test using this framework hypotheses which are not linear
or which are multiplicative, e.g.,
H0: 2 3 = 2 or H0: 2 2 = 1
cannot be tested.
F-test example
• Question: Suppose a researcher wants to test whether the returns on a company stock (y) show
unit sensitivity to two factors (factor x2 and factor x3) among three considered. The regression is
carried out on 144 monthly observations. The regression is:
yt = 1 + 2x2t + 3x3t + 4x4t+ ut
- What are the restricted and unrestricted regressions?
- If the two RSS are 436.1 and 397.2 respectively, perform the test.
• Solution:
Unit sensitivity implies H0:2 = 1 and 3 = 1. The unrestricted regression is the one in the
question.
• The result of F-test is exactly the same as the t-test for the beta coefficient
(since there is only one slope coefficient).
• Thus, the F-test statistic is equal to the square of the slope t-ratio.
Sample EViews output for multiple hypothesis tests
• The F-version is adjusted for small sample bias and should be used when the
regression is estimated using a small sample.
• Both statistics asymptotically yield the same result, and in this case the p-values are
very similar.
• The conclusion is that the joint null hypothesis, H0: β1 = 1 and β2 = 1, is not rejected.
Multiple regression in EViews using an APT-style model
• Whether the monthly returns on Microsoft stock can be explained by reference to unexpected
changes in a set of macroeconomic and financial variables.
• Microsoft stock price → dependent variable
• S&P500 index value
• consumer price index (CPI)
• Industrial production index (IPI)
• Treasury bill yields for the following maturities:
– three months
– six months
– one year
– three years
– five years
– ten years
• measure of ‘narrow’ money supply
• consumer credit
• ‘credit spread’ (difference in annualised average yields between a portfolio of bonds rated AAA
and a portfolio of bonds rated BAA).
Multiple regression in EViews using an APT-style
model
• Generate a set of changes or differences for each of the variables, since the APT posits that
the stock returns can be explained to the unexpected changes in the macroeconomic
variables rather than their levels.
• The unexpected value of a variable can be defined as the difference between the actual
(realised) value of the variable and its expected value.
• How investors might have formed their expectations while there are many ways to construct
measures of expectations?
– the easiest is to assume that investors have naive expectations that the next period
value of the variable is equal to the current value.
• This being the case, the entire change in the variable from one period to the next is the
unexpected change (because investors are assumed to expect no change).
– It is an interesting question as to whether the differences should be taken on the levels
of the variables or their logarithms.
– If the former, we have absolute changes in the variables, whereas the latter would lead
to proportionate changes.
– We assume that the former is chosen.
Multiple regression in EViews using an APT-style model
• mustb3m = ustb3m/12
• rmsoft = 100*dlog(microsoft)
– ermsoft = rmsoft - mustb3m
• rsandp = 100*dlog(sandp)
– ersandp = rsandp - mustb3m
• dprod = industrial production - industrial production(-1)
• dcredit = consumer credit - consumer credit(-1)
• inflation = 100*dlog(cpi)
– dinflation = inflation - inflation(-1)
• dmoney = m1money supply - m1money supply(-1)
• dspread = baa_aaa_spread – baa_aaa_spread(-1)
• rterm = term - term(-1)
Multiple regression in EViews using an APT-style model
• F(3, 244) distribution; 3 restrictions; 252 usable observations; 8 parameters to estimate in the unrestricted
regression.
• The F-statistic value suggesting that the null hypothesis cannot be rejected.
• The parameters on DINLATION and DMONEY are almost significant at the 10% level and the variables are
retained.
Multiple regression in EViews using an APT-style model
• This starts with no variables in the regression (or only those variables that are
always required by the researcher to be in the regression) and then it selects
first the variable with the lowest p-value (largest t-ratio) if it were included,
then the variable with the second lowest p-value conditional upon the first
variable already being included, and so on.
• The procedure continues until the next lowest p-value relative to those already
included variables is larger than some specified threshold value, then the
selection stops, with no more variables being incorporated into the model.
Multiple regression in EViews using an APT-style model
Multiple regression in EViews using an APT-style model
• “Forwards’ will start with the list of required regressors (the intercept only in this case)
and will sequentially add to them.
• ‘Backwards’ will start by including all of the variables and will sequentially delete
variables from the regression.