Multiple Regression
Multiple Regression
Regression
Amar Saxena
Amar.Saxena@IFMR.ac.in
+91.993.002.2910
Why Multiple Regression
• Our real world is multivariable
• Multivariable analysis is a tool to determine the relative
contribution of all factors
Error Term
• The errors are normally distributed
• Errors have a constant variance
• The model errors are independent
Errors (residuals) from the regression model: εi = (Yi – Ŷi)
Amar Saxena | 993.002.2910 | AmarSaxena@gmail.com Slide 5
What should be the price of a flat?
Example:
Suppose you are considering purchasing a flat. And want to determine the
key determinants of the price.
• You believe that the price of flat is dependent on -
o x1 = Number of Bedrooms in the flat
o x2 = Number of Bathrooms in the flat Data for Multiple
o x3= Size of the flat Regression
Model
y = a + b1x1 + b2x2+ b3x3 + ε
Step 1
y = 475531.87
- 1005440.73 x1
+ 459811.58 x2
+ 3096.01 x3
+ε
Model Assessment
• The model is assessed using three measures:
o The standard error of estimate
o The coefficient of determination
o The F-test of the analysis of variance
• The standard error of estimates is used in the calculations for
the other measures.
Amar Saxena | 993.002.2910 | AmarSaxena@gmail.com Slide 8
Standard Error of Estimate
• The standard deviation of the error is estimated by the
Standard Error of Estimate:
SSE
s
n k 1
(k+1 coefficients were estimated)
• The magnitude of sε is judged by comparing it to Ŷ
Mean y = ₹ 38,33,291.10
sε = 2524622.335
It seems that sε is not particularly small
(relative to the mean of Y) – 66%.
• Question:
Can we conclude the model does not fit
the data well? Not Necessarily.
Coefficient of Determination (R2)
• We know that, R 1
2
SSE
SSE
R2 = 0.479
(Yi Y ) SST
2
SSR
k=
n-k-1 =
SSE MSR = SSR/k
n-1 =
MSE = SSE/n-k-1
Conclusion: There is sufficient evidence to reject the
null hypothesis in favor of the alternative hypothesis.
This linear regression model is valid
R2adjusted
• Bathrooms b1 = - 10,05,440.73
o For each additional bathroom, price of flat reduces by Rs 10 lakhs
o Does this make conventional sense?
• Bedrooms b2 = 4,59,811.58
o For each additional bedroom, flat price increases by Rs 4.6 lakhs
• Excel Output
Ignore
Very strong
Reasonably strong
Strongest
1) Forward Selection
• Start by choosing the independent variable which explains the
most variation in the dependent variable.
• Choose a second variable which explains the most residual
variation, and then recalculate regression coefficients.
• Continue until no variables "significantly" explain residual
variation.
Stepwise Regression
2) Backward Selection
• Start with all the variables in the model, and drop the
least "significant", one at a time, until you are left with
only "significant" variables.
• Categorical Regression
o Used when there is a combination of nominal, ordinal, and
interval-level independent variables.