Multiple Regression
Multiple Regression
Multiple Regression
Multiple Regression
Part 1. Basic Multiple Regression
The Linear Regression Model
The Least Squares Point Estimates
The Mean Squared Error and the Standard
Error
Model Utility: R2, Adjusted R2, and the F Test
Testing Significance of an Independent
Variable
Confidence Intervals and Prediction Intervals
• Part 2 Using Squared and Interaction Terms
The Quadratic Regression Model
Interaction
1
06-05-10
Multiple Regression
Part 3 Dummy Variables and Advanced Statistical
Inferences
Dummy Variables to Model Qualitative Variables
The Partial F Test: Testing a Portion of a Model
where
µ y|x1 , x2 ,..., xk = β0 + β1 x1 + β2 x2 + ... + βk xk is the mean value of the
dependent variable y when the values of the independent
variables are x1, x2, …, xk.
β0 , β1 , β2 ,..., βk are the regression parameters relating the mean
value of y to x1, x2, …, xk.
ε is an error term that describes the effects on y of all factors other
than the independent variables x1, x2, …, xk .
2
06-05-10
y = β0 + β1 x1 + β2 x2 + ε
3
06-05-10
Estimation/Prediction Equation:
is the point estimate of the mean value of the dependent variable when
the values of the independent variables are x01, x02, …, x0k. It is also the
point prediction of an individual value of the dependent variable
when the values of the independent variables are x01, x02, …, x0k.
b1, b2, …, bk are the least squares point estimates of the parameters
β1, β 2, …, β k.
4
06-05-10
Analysis of Variance
Source DF SS MS F P
Regression 2 24.875 12.438 92.30 0.000
Residual Error 5 0.674 0.135
Total 7 25.549
5
06-05-10
SSE 0.674
s 2 = MSE = = = 0.1348 s = s 2 = 0.1348 = 0.3671
n-(k + 1) 8 − 3
6
06-05-10
The Adjusted R2
24.875 2 8 − 1
R2 = = 0.974, R 2 = 0.974 − = 0.963
25.549 8 − 1 8 − (2 + 1)
Test Statistic:
(Explained variation)/k
F(model) =
(Unexplained variation)/[n - (k + 1)]
7
06-05-10
Analysis of Variance
Source DF SS MS F P
Regression 2 24.875 12.438 92.30 0.000
Residual Error 5 0.674 0.135
Total 7 25.549
Test Statistic:
(Explained variation)/k 24.875 / 2
F(model) = = = 92.30
(Unexplained variation)/[n - (k + 1)] 0.674 /(8 − 3)
Reject H0 at α level of significance, since
F-test at α = 0.05
F(model) = 92.30 > 5.79 = F.05 and level of significance
p - value ≈ 0.000 < 0.05 = α
Fα is based on 2 numerator and 5 denominator degrees of freedom.
8
06-05-10
9
06-05-10
Model y= β0 + β1 x + β2 x
2
10
06-05-10
Units of Mileage,
Additive, x y (MPG)
0 25.8
0 26.1
0 25.4
1 29.6
1 29.2
1 29.8
2 32.0
2 31.4
2 31.7
3 31.7
3 31.5
3 31.2
4 29.4
4 29.0
4 29.5
11
06-05-10
Interaction
Modeling Interaction
12
06-05-10
Analysis of Variance
Source DF SS MS F P
Regression 2 21412 10706 199.32 0.000
Residual Error 7 376 54
Total 9 21788
13
06-05-10
14
06-05-10
Correlation Matrix
Example: The Sale Territory Performance Case
Multicollinearity
Effects
Hinders ability to use bjs, t statistics, and p-values to assess the
relative importance of predictors.
Does not hinder ability to predict the dependent (or response)
variable.
Detection
Scatter Plot Matrix
Correlation Matrix
Variance Inflation Factors (VIF)
15
06-05-10
Notes:
VIFj = 1 implies xj not related to other predictors
max(VIFj) > 10 suggest severe multicollinearity
mean(VIFj) substantially greater than 1 suggests severe
multicollinearity
16
06-05-10
17
06-05-10
Studentized
Studentized Deleted
Observation Hours Predicted Residual Leverage Residual Residual Cook's D
1 566.520 688.409 -121.889 0.121 -0.211 -0.203 0.002
2 696.820 721.848 -25.028 0.226 -0.046 -0.044 0.000
3 1,033.150 965.393 67.757 0.130 0.118 0.114 0.001
4 1,603.620 1,172.464 431.156 0.159 0.765 0.752 0.028
5 1,611.370 1,526.780 84.590 0.085 0.144 0.138 0.000
6 1,613.270 1,993.869 -380.599 0.112 -0.657 -0.642 0.014
7 1,854.170 1,676.558 177.612 0.084 0.302 0.291 0.002
8 2,160.550 1,791.405 369.145 0.083 0.627 0.612 0.009
9 2,305.580 2,798.761 -493.181 0.085 -0.838 -0.828 0.016
10 3,503.930 4,191.333 -687.403 0.120 -1.192 -1.214 0.049
11 3,571.890 3,190.957 380.933 0.077 0.645 0.630 0.009
12 3,741.400 4,364.502 -623.102 0.177 -1.117 -1.129 0.067
13 4,026.520 4,364.229 -337.709 0.064 -0.568 -0.553 0.006
14 10,343.810 8,713.307 1,630.503 0.146 2.871 4.558 0.353
15 11,732.170 12,080.864 -348.694 0.682 -1.005 -1.006 0.541
16 15,414.940 15,133.026 281.914 0.785 0.990 0.989 0.897
17 18,854.450 19,260.453 -406.003 0.863 -1.786 -1.975 5.033
18
06-05-10
Leverage Values
Leverage = distance value (hi )
An observation is outlying with respect to x if it has a large leverage,
greater than 2(k+1)/n
Hospital Labor Needs Case: n = 17, k = 3, 2(3+1)/17 = 0.4706
Studentized
Studentized Deleted
Observation Hours Predicted Residual Leverage Residual Residual Cook's D
1 566.520 688.409 -121.889 0.121 -0.211 -0.203 0.002
2 696.820 721.848 -25.028 0.226 -0.046 -0.044 0.000
3 1,033.150 965.393 67.757 0.130 0.118 0.114 0.001
4 1,603.620 1,172.464 431.156 0.159 0.765 0.752 0.028
5 1,611.370 1,526.780 84.590 0.085 0.144 0.138 0.000
6 1,613.270 1,993.869 -380.599 0.112 -0.657 -0.642 0.014
7 1,854.170 1,676.558 177.612 0.084 0.302 0.291 0.002
8 2,160.550 1,791.405 369.145 0.083 0.627 0.612 0.009
9 2,305.580 2,798.761 -493.181 0.085 -0.838 -0.828 0.016
10 3,503.930 4,191.333 -687.403 0.120 -1.192 -1.214 0.049
11 3,571.890 3,190.957 380.933 0.077 0.645 0.630 0.009
12 3,741.400 4,364.502 -623.102 0.177 -1.117 -1.129 0.067
13 4,026.520 4,364.229 -337.709 0.064 -0.568 -0.553 0.006
14 10,343.810 8,713.307 1,630.503 0.146 2.871 4.558 0.353
15 11,732.170 12,080.864 -348.694 0.682 -1.005 -1.006 0.541
16 15,414.940 15,133.026 281.914 0.785 0.990 0.989 0.897
17 18,854.450 19,260.453 -406.003 0.863 -1.786 -1.975 5.033
Residual ei
Studentized Residual = ei′ =
Residual Standard Error s 1 − h
i
An observation is outlying with respect to y if it has a large studentized
(or standardized) residual, |StRes| greater than 2
Studentized
Studentized Deleted
Observation Hours Predicted Residual Leverage Residual Residual Cook's D
1 566.520 688.409 -121.889 0.121 -0.211 -0.203 0.002
2 696.820 721.848 -25.028 0.226 -0.046 -0.044 0.000
3 1,033.150 965.393 67.757 0.130 0.118 0.114 0.001
4 1,603.620 1,172.464 431.156 0.159 0.765 0.752 0.028
5 1,611.370 1,526.780 84.590 0.085 0.144 0.138 0.000
6 1,613.270 1,993.869 -380.599 0.112 -0.657 -0.642 0.014
7 1,854.170 1,676.558 177.612 0.084 0.302 0.291 0.002
8 2,160.550 1,791.405 369.145 0.083 0.627 0.612 0.009
9 2,305.580 2,798.761 -493.181 0.085 -0.838 -0.828 0.016
10 3,503.930 4,191.333 -687.403 0.120 -1.192 -1.214 0.049
11 3,571.890 3,190.957 380.933 0.077 0.645 0.630 0.009
12 3,741.400 4,364.502 -623.102 0.177 -1.117 -1.129 0.067
13 4,026.520 4,364.229 -337.709 0.064 -0.568 -0.553 0.006
14 10,343.810 8,713.307 1,630.503 0.146 2.871 4.558 0.353
15 11,732.170 12,080.864 -348.694 0.682 -1.005 -1.006 0.541
16 15,414.940 15,133.026 281.914 0.785 0.990 0.989 0.897
17 18,854.450 19,260.453 -406.003 0.863 -1.786 -1.975 5.033
19
06-05-10
Cook’s Distance
ei2 hi
Cook's Distance = Di = 2
(k + 1) s 2 (1 − hi )
An observation is influential with respect to the estimated regression
parameters b0, b1, …, bk if it has a large Cook’s distance, Di greater
than F.50 [with k+1 and n-(k+1) d.f.]
Hospital Labor Needs Case: (3+1) = 4, (17-3-1) =13, Di > F.50 = 0.8845
Studentized
Studentized Deleted
Observation Hours Predicted Residual Leverage Residual Residual Cook's D
1 566.520 688.409 -121.889 0.121 -0.211 -0.203 0.002
2 696.820 721.848 -25.028 0.226 -0.046 -0.044 0.000
3 1,033.150 965.393 67.757 0.130 0.118 0.114 0.001
4 1,603.620 1,172.464 431.156 0.159 0.765 0.752 0.028
5 1,611.370 1,526.780 84.590 0.085 0.144 0.138 0.000
6 1,613.270 1,993.869 -380.599 0.112 -0.657 -0.642 0.014
7 1,854.170 1,676.558 177.612 0.084 0.302 0.291 0.002
8 2,160.550 1,791.405 369.145 0.083 0.627 0.612 0.009
9 2,305.580 2,798.761 -493.181 0.085 -0.838 -0.828 0.016
10 3,503.930 4,191.333 -687.403 0.120 -1.192 -1.214 0.049
11 3,571.890 3,190.957 380.933 0.077 0.645 0.630 0.009
12 3,741.400 4,364.502 -623.102 0.177 -1.117 -1.129 0.067
13 4,026.520 4,364.229 -337.709 0.064 -0.568 -0.553 0.006
14 10,343.810 8,713.307 1,630.503 0.146 2.871 4.558 0.353
15 11,732.170 12,080.864 -348.694 0.682 -1.005 -1.006 0.541
16 15,414.940 15,133.026 281.914 0.785 0.990 0.989 0.897
17 18,854.450 19,260.453 -406.003 0.863 -1.786 -1.975 5.033
20