CH-3-Multiple Linear Regression
CH-3-Multiple Linear Regression
CHAPTER THREE
MULTIPLE LINEAR REGRESSION ANALYSIS: ESTIMATION AND HYPOTHESIS TESTING
3.1 Introduction
So far, we have been discussing about the simplest form of regression analysis called simple linear regression
analysis. However, the most realistic representation of real world economic relationships is obtained from
multiple regression analysis. This is because most financial and economic variables are determined by more
than a single variable.
In this chapter we expand the SLR model of chapter 2 to a Multiple Regression Model, which means that there
is not one explanatory variable, but there are k explanatory variables.
𝒀𝒊 = 𝜶 + 𝜷𝟏 𝑿𝟏𝒊 + 𝜷𝟐 𝑿𝟐𝒊 + ⋯ + 𝜷𝒌 𝑿𝒌𝒊 + 𝒖𝒊
where 𝑌 is the dependent variable value of observation i, 𝛼 is the constant (the intercept), 𝛽 is the coefficient
(or the slope) for the kth explanatory variable, 𝑋 is the value of the kth variable for observation i, 𝑢 is the error
term of observation i.
7. Independence of 𝐔𝐢 and 𝐗 𝐢 : Every disturbance term 𝑈 is independent of the explanatory variables. i.e.,
𝐸 (𝑈 𝑋 ) = 𝐸 (𝑈 𝑋 ) = 0. This means that there is no omitted variable bias.
8. X values do vary in the sample.
9. No perfect multicollinearity: The explanatory variables of the models are not perfectly correlated. That is,
no explanatory variable of the model is a linear combination of the other. Perfect collinearity is a problem,
because the least squares estimator cannot separately attribute variation in 𝑌 to the independent variables.
Example: Suppose we regress weight (𝑌) on height measured in meters (𝑋 ) and height measured
in centimeters(𝑋 ). How could we decide which regressor to attribute the changing weight to?
We can solve the above matrix using 𝐂𝐫𝐚𝐦𝐞𝐫’𝐬 𝐫𝐮𝐥𝐞 and obtain 𝜷𝟏 and 𝜷𝟐 as follows
∑ 𝒙𝟏𝒊 𝒚𝒊 ∑ 𝒙𝟏𝒊 𝒙𝟐𝒊 ∑ 𝒙𝟐𝟏𝒊 ∑ 𝒙𝟏𝒊 𝒚𝒊
∑ 𝒙𝟐𝒊 𝒚𝒊 ∑ 𝒙𝟐𝟐𝒊 ∑ 𝒙𝟏𝒊 𝒙𝟐𝒊 ∑ 𝒙𝟐𝒊 𝒚𝒊
𝜷𝟏 = ∑ 𝒙𝟐𝟏𝒊 ∑ 𝒙𝟏𝒊 𝒙𝟐𝒊
𝑎𝑛𝑑 𝜷𝟐 = ∑ 𝒙𝟐𝟏𝒊 ∑ 𝒙𝟏𝒊 𝒙𝟐𝒊
∑ 𝒙𝟏𝒊 𝒙𝟐𝒊 ∑ 𝒙𝟐𝟐𝒊 ∑ 𝒙𝟏𝒊 𝒙𝟐𝒊 ∑ 𝒙𝟐𝟐𝒊
Therefore, we obtain;
In general, the estimates 𝜷𝟏 and 𝜷𝟐 have partial effect, or ceteris paribus, interpretations. From the above
equation, we have
∆𝒀𝒊 = 𝜷𝟏 ∆𝑿𝟏 + 𝜷𝟐 ∆𝑿𝟐
So, we can obtain the predicted change in 𝒀 given the changes in 𝑿𝟏 and 𝑿𝟐 . In particular, when 𝑿𝟐 is held
fixed, so that ∆𝑿𝟐 = 𝟎, then
∆𝒀𝒊 = 𝜷𝟏∆𝑿𝟏 ,
holding 𝑿𝟐 fixed. The key point is that, by including 𝑿𝟐 in our model, we obtain a coefficient on 𝑿𝟏 with a
ceteris paribus interpretation. This is why multiple regression analysis is so useful. Similarly,
∆𝒀𝒊 = 𝜷𝟏∆𝑿𝟐 , holding 𝑿𝟏 fixed.
How can one interpret the coefficients of Educ and Exper? (NB: the coefficients have a percentage
interpretation when multiplied by 100)
The coefficient 0.125 means that, holding exper fixed, another year of education is predicted to increase
𝐥𝐧(𝐰𝐚𝐠𝐞) by 𝟏𝟐. 𝟓% increase in wage, on average. Alternatively, if we take two people with the same levels
of experience, the coefficient on educ is the proportionate difference in predicted wage when their education
levels differ by one year. Similarly, the coefficient of Experience, 0.085 means that holding Educ fixed, another
year of related work experience is predicted to increase 𝐥𝐧(𝐰𝐚𝐠𝐞) by 𝟖. 𝟓%, on average.
How can we interpret this estimation? 𝛼 = 14661 means that, if there are no radio and tv ads (𝑋 = 0 and
𝑋 = 0), we expect the sales to be 14661 blocks of soap. 𝛽 = 192 means that if radio ads increase by one, we
expect the soap sales to increase by 192 blocks, assuming that tv ads are unchanged (ceteris paribus, thus
controlling for the number of tv ads). This is a significant effect (t:2.54, p:0.023, which is less than 0.05).
𝛽 = −77 means that if the team adds one tv ad, the soap sales are expected to drop by 77 blocks (assuming
that radio ads stay the same, thus controlling for the number of radio ads). This is not in line with our
expectation, as theory predicts that ads will increase sales. A possible reason may be that the tv ad is of bad
quality (not convincing the consumers to buy the soap). Also, when looking at the t-test results, we see that 𝛽
does not significantly differ from 0 (t:-0.2, p:0.393, which is more than 0.05). Therefore, we can conclude that
tv ads do not influence soap sales.
The implication of this model for the management is that they should rather focus on radio advertisement to
boost sales, than focus on tv advertisement. If they want to use tv advertisement, they must change the content
of the ad, as the current ad does not lead to increased sales. For finding the method to calculate MLR with >2
explanatory variables by hand, read the Reading Assignment Chapter 3.
∑𝑦 = 𝛽 ∑𝑥 𝑦 + 𝛽 ∑𝑥 𝑦 + ∑ 𝑒 … … … … . . (3.19)
As in simple regression, R2 is also viewed as a measure of the prediction ability of the model over the sample
period, or as a measure of how well the estimated regression fits the data. If 𝑹𝟐 is high, the model is said to
“fit” the data well. On the other hand, if 𝑹𝟐 is low, the model does not fit the data well.
Where n is the sample size and k is the number of parameter estimates ( and ’s) in the model. This measure
does not always goes up when a variable is added because of the degree of freedom term, 𝒏 − 𝒌 is the
numerator. That is, 𝑹𝟐 imposes a penalty for adding additional regressors to a model. If a regressor is added to
the model then RSS decreases, or at least remains constant. On the other hand, the degrees of freedom of the
regression 𝒏 − 𝒌 always decrease.
Consider the example given in Figure 2, about the effect of advertisement (radio and tv) on sales of soap. In the
output, the TSS, ESS and RSS are displayed. The R2 is
𝑴𝑺𝑺 𝟑. 𝟐𝟖𝟗𝟗𝒆 + 𝟎𝟗
𝑹𝟐 = = = 𝟎. 𝟑𝟏𝟕𝟗
𝑻𝑺𝑺 𝟏. 𝟎𝟑𝟒𝟖𝒆 + 𝟏𝟎
The adjusted R2 considers not the SS (sum of squares), but the MS (mean squares). If the RSS is divided by n-k
(in this case 17-3=14), the RMS becomes 504120809. If the TSS is divided by n-1 (in this case 17-1=16), the
TMS becomes 646722450. This allows us to calculate the adjusted R2 as follows:
∑ 𝒆𝟐𝒊 ⁄𝒏 − 𝒌 𝟕. 𝟎𝟓𝟕𝟕𝒆 + 𝟎𝟗⁄𝟏𝟒
𝑹𝟐 = 𝟏 − = 𝟏 − = 𝟎. 𝟐𝟐𝟎𝟓
∑ 𝒚𝟐𝒊 ⁄𝒏 − 𝟏 𝟏. 𝟎𝟑𝟒𝟖𝒆 + 𝟏𝟎⁄𝟏𝟔
of 𝑿𝟐 on Y constant. Mathematically, test of individual significance involves testing the following two pairs of
null and alternative hypotheses.
𝑨. 𝑯𝟎 : 𝜷𝟏 = 𝟎 B. 𝑯𝟎 : 𝜷𝟐 = 𝟎
𝑯𝑨 : 𝜷𝟏 ≠ 𝟎 𝑯𝑨 : 𝜷𝟐 ≠ 𝟎
The null hypothesis in 𝐴 states that holding 𝑿𝟐 constant, 𝑿𝟏 has no significant (linear) influence on 𝒀. Similarly,
the null hypothesis in ‘𝑩’ states that holding 𝑿𝟏 constant, 𝑿𝟐 has no influence on the dependent variable 𝒀. To
test the individual significance of parameter estimates in MLRMs, it is common to use the Student’s t-test (like
you’ve seen in chapter 2)
Like you’ve seen in chapter 2, Stata output presents the t-test statistic (tc) and the corresponding p-value (the
probability to get this (or a more extreme) t if H0 is true). Note that the output in Figure 3 indicates that the
constant () is 14661, with a t of 0.88, and a p-value of 0.393. As the p-value of the constant is > 0.05, we can
conclude that is not statistically significant from 0.
𝛽 , the coefficient for the explanatory variable “radio_ads” is 191.66, with a t-value of 2.54 and a p-value of
0.023. As the p-value<0.05, it can be concluded that 𝛽 is significantly different from 0. As the coefficient is
positive, we can state that, after controlling for tv_ads, radio_ads has a significant positive effect on sales. What
about tv_ads? Make the answer by yourself & validate it answer in the class.
Thus, this test has the following null and alternative hypotheses to test:
𝑯𝟎 : 𝜷𝟏 = 𝜷𝟐 = 𝜷𝟑 … … … … … … . . = 𝜷𝒌 = 𝟎
𝑯𝑨 : 𝐴t least one of the 𝜷 is different from zero
The null hypothesis in a joint hypothesis states that none of the explanatory variables included in the model are
relevant in a sense that no amount of the variation in Y can be attributed to the variation in all explanatory
variables simultaneously. That means if all explanatory variables of the model are change simultaneously it will
left the value of Y unchanged.
If the null-hypothesis is true, that is if all the explanatory variables included in the model are irrelevant then,
there wouldn’t be a significant explanatory power difference between the models with and without all the
explanatory variables. Thus, test of the overall significance of MLRMs can be approached by testing whether
the difference in explanatory power of the model with and without all explanatory variables is significant or not.
In this case, if the difference is insignificant we accept the null-hypothesis and reject it if the difference is
significant.
Similarly, this test can be done by comparing the sum of squared errors (RSS) of the model with and without
all explanatory variables. In this case we accept the null-hypothesis if the difference between the sums of
squared errors (RSS) of the model with and without all explanatory variables is insignificant. The notion of
this is straightforward in a sense that if all explanatory variables are irrelevant then, inclusion of them in the
model contributes insignificant amount to the explanation of the model as a result the sample prediction error of
the model wouldn’t reduce significantly.
Let the Restricted Residual Sum of Square (RRSS) be the sum of squared errors of the model without the
inclusion of all the explanatory variables of the model, i.e., the residual sum of square of the model obtained
assuming that all the explanatory variables are irrelevant (under the null hypothesis) and Unrestricted Residual
Sum of Squares (URSS) be the sum of squared errors of the model with the inclusion of all explanatory
variables in the model. It is always true that 𝑹𝑹𝑺𝑺 ≥ 𝑼𝑹𝑺𝑺 (why?). To elaborate these concepts consider the
following model
𝒀𝒊 = 𝜷𝟎 + 𝜷𝟏 𝑿𝟏 + 𝜷𝟐𝑿𝟐 + 𝜷𝟑 𝑿𝟑 + ⋯ … . . +𝜷𝒌 𝑿𝒌 +𝒆𝒊
This model is called the unrestricted model. The test of joint hypothesis is given by:
𝑯𝟎 : 𝜷𝟏 = 𝜷𝟐 = 𝜷𝟑 … … … … … … . . = 𝜷𝒌 = 𝟎
𝑯𝑨 : 𝐴t least one of the 𝜷 is different from zero
We know that:
𝒀𝒊 = 𝒀𝒊 +𝒆𝒊 ⇒ 𝒆𝒊 = 𝒀𝒊 − 𝒀𝒊
𝒆𝟐𝒊 = (𝒀𝒊 − 𝒀𝒊 )𝟐
This sum of squared error is called unrestricted residual sum of square (URSS).
However, if the null hypothesis is assumed to be true, i.e., when all the slope coefficients are zero the model
shrinks to:
𝒀𝒊 = 𝜷𝟎 + 𝒆𝒊
This model is called restricted model. Applying OLS we obtain:
∑ 𝒀𝒊
𝜷𝟎 = = 𝒀 … … … … … … … … … … … … … … … . . (𝟑. 𝟐𝟑)
𝒏
Therefore, 𝒆𝒊 = 𝒀𝒊 − 𝜷𝟎 , but 𝜷𝟎 = 𝒀
𝒆𝒊 = 𝒀𝒊 − 𝒀
∴ ∑ 𝒆𝒊 = ∑(𝒀𝒊 − 𝒀)𝟐 = ∑ 𝒚𝟐𝒊 = 𝑻𝑺𝑺
𝟐
The sum of squared error when the null hypothesis is assumed to be true is called Restricted Residual Sum of
Square (RRSS) and this is equal to the total sum of square (TSS).
The ratio:
𝑹𝑹𝑺𝑺 − 𝑼𝑹𝑺𝑺⁄𝑲 − 𝟏
~𝑭(𝑲 𝟏, 𝒏 𝑲) … … … … … … … . . (𝟑. 𝟐𝟒)
𝑼𝑹𝑺𝑺⁄𝒏 − 𝑲
has an 𝑭 − 𝒅𝒊𝒕𝒓𝒊𝒃𝒖𝒕𝒊𝒐𝒏 with 𝒌 − 𝟏 and 𝒏 − 𝒌 degrees of freedom for the numerator and denominator,
respectively.
𝑹𝑹𝑺𝑺 = 𝑻𝑺𝑺
𝑼𝑹𝑺𝑺 = ∑ 𝒚𝟐𝒊 − 𝜷𝟏 ∑ 𝒙𝟏 𝒚 − 𝜷𝟐 ∑ 𝒙𝟐 𝒚 − ⋯ … . . … … … . . . − 𝜷𝒌 ∑ 𝒙𝒌 𝒚 = 𝑹𝑺𝑺
𝑰. 𝒆. , 𝑼𝑹𝑺𝑺 = 𝑹𝑺𝑺
𝑻𝑺𝑺 − 𝑹𝑺𝑺⁄𝑲 − 𝟏
𝑭= ~𝑭(𝑲 𝟏, 𝒏 𝑲)
𝑹𝑺𝑺⁄𝒏 − 𝑲
𝑬𝑺𝑺⁄𝑲 − 𝟏
𝑭𝒄(𝑲 𝟏, 𝒏 𝑲) = … … … … … … … … … … … … … (𝟑. 𝟐𝟓)
𝑹𝑺𝑺⁄𝒏 − 𝑲
If we divide numerator and denominator of the above equation by 𝑻𝑺𝑺 then:
𝑬𝑺𝑺
𝑲−𝟏
𝑭𝒄 = 𝑻𝑺𝑺
𝑹𝑺𝑺
𝒏−𝑲
𝑻𝑺𝑺
𝑹𝟐 ⁄𝑲 − 𝟏
∴ 𝑭𝒄 = … … … … … … … … … … … … … … … (𝟑. 𝟐𝟔)
𝟏 − 𝑹𝟐 ⁄𝒏 − 𝑲
This implies that the computed value of F can be calculated either as a ratio of 𝑬𝑺𝑺 & 𝑻𝑺𝑺 or 𝑹𝟐 & 𝟏 − 𝑹𝟐 .
This value is compared with the table value of F which leaves the probability of 𝜶 in the upper tail of the F-
distribution with 𝒌 − 𝟏 & 𝒏 − 𝒌 degrees of freedom.
If the null hypothesis is not true, then the difference between RRSS and URSS (or i.e., TSS & RSS)
becomes large, implying that the constraints placed on the model by the null hypothesis have large effect
on the ability of the model to fit the data, and the value of 𝑭 tends to be large. Thus, reject the null
hypothesis if the computed value of F (i.e., F test statistic) becomes too large or the P-value for the F-
statistic is lower than any acceptable level of significance (𝜶), and vice versa.
Implication: 𝑅𝑒𝑗𝑒𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝐻 implies that the parameters of the model are jointly significant or the
dependent variable 𝑌 is linearly related to the independent variables included in the model.
For example, based on the output in figure 4, it is found that 𝛼 = 14661.19, 𝛽 = 191.6551 and 𝛽 = −77.49.
Therefore, if we want to forecast the sales of a week with 1500 radio ads and 25 tv ads, the predicted sales are:
𝑌 = 14661.19 + 191.6551 ∗ 1500 − 77.49 ∗ 25 ≈ 300,207
This is point forecasting, because it indicates one point (one exact number) as forecast. However, it is very
unlikely that the sales are exactly 300207 if the radio ads are 1500 and the tv ads 25. There is some variation
around this prediction. This can be expressed in the interval forecast. This interval forecast is a confidence
interval for the predicted value. Similar to the confidence interval, as discussed in chapter 2, it considers the t-
value and the standard error.
The t-value should be obtained from the table and depends on the accuracy of the interval. The accuracy is
measured by , which is the acceptable probability to draw a wrong conclusion given the data (type 1 error).
Once the is known, the t-value can be obtained from the table: t(1-/2,N-2). For example, if we want to calculate
a 95% prediction interval for a case based on a model from a sample of 200, =100%-95%=5%=0.05 and
N=200, so the table value is t0.975,198=1.96. (see table 2.5, from ch 2).
Next, also the standard error of the forecast/prediction is needed. The standard error of the forecast can be
derived by
Where RMS is the residual mean squares (see section 3.3 if you forgot what that was), Xh is the matrix
including the values of X for the case that we want to predict/forecast (in our case (1500 25)) and X is the
matrix with all X-values (of the whole dataset from the sample).
Rather than calculating the standard error by hand, econometricians use Stata to compute this standard error of
the forecast, and to find the interval forecast. Figure 5 displays the forecast (Coef.) of the case where X1=1500
and X2=25. The predicted value is indeed 300207 soap block sales, and the standard error of this prediction is
101034. This leads to a 95% confidence interval of sales between 83510 and 516902 blocks of soap. That
means that we can be 95% sure that a new observation (a new week) with 1500 radio ads and 25 tv ads
(X1=1500 and X2=25) will result in sales between 83510 and 516902 blocks of soap. Note that this prediction
interval is very wide, which suggests that there is a lot of uncertainty about the prediction. We need to include
additional explanatory variables to better understand the effect of ad quantities on sales.