Multiple Linear Regression
Multiple Linear Regression
Multiple Linear Regression
Regression
Y 0 1x1 2 x2 ... k xk
2
Y 0 1x1 2 x2 3 x1x2 4 x2 ... k xk
y1 1 x11 x21 xk 1 0 1
y 1 x12 x22
xk 2 1 2
2
yn 1 x1n x2 n xkn k n
Y X
Ordinary Least Squares Estimation for Multiple
Linear Regression
The variance of the residuals, Var(i|Xi), is constant for all values of Xi.
When the variance of the residuals is constant for different values of Xi, it
is called homoscedasticity. A non-constant variance of residuals is called
heteroscedasticity.
Hat matrix plays a crucial role in identifying the outliers and influential
observations in the sample.
Multiple Linear Regression Model Building
A few examples of MLR are as follows:
Y 0 1 X1 2 X 2 i
Partial Correlation
• Partial correlation is the correlation between the response
variable Y and the explanatory variable X1 when influence of X2
is removed from both Y and X1 (in other words, when X2 is kept
constant).
Model Summary
1 1 1 0 0 9800
11 2 0 1 0 17200
19 3 0 0 1 18500
27 4 0 0 0 7700
The corresponding regression model is as follows:
Y = 0 + 1 × HS + 2 × UG + 3 × PG
Note that in Table 10.4, all the dummy variables are statistically significant
= 0.01, since p-values are less than 0.01.
Interpretation of Regression Coefficients of
Categorical Variables
That is, the change in salary for female when WE increases by one year is
609.639 and for male is 3523.547. That is the salary for male workers is
increasing at a higher rate compared female workers. Interaction variables
are an important class of derived variables in regression model building.
Validation of Multiple Regression Model
n
2
(Yi Yi )
2 SSE i 1
R 1 1
SST n
2
i(Y Y )
i 1
• SSE is the sum of squares of errors and SST is the sum of squares
of total deviation. In case of MLR, SSE will decrease as the
number of explanatory variables increases, and SST remains
constant.
SSE/(n - k - 1)
Adjusted R - Square 1 -
SST/(n - 1)
The null and alternative hypotheses in the case of individual independent
variable and the dependent variable Y is given, respectively, by
Alternatively,
• H0: i = 0
• HA: i 0
The corresponding test statistic is given by
i 0 i
t
Se (i ) Se (i )
Residual Analysis in Multiple Linear
Regression
Residual analysis is important for checking assumptions about
normal distribution of residuals, homoscedasticity, and the
functional form of a regression model.