Multiple Regression Slides
Multiple Regression Slides
Definition Definition
Multiple Regression Equation Multiple Regression Equation
A linear relationship between a dependent A linear relationship between a dependent
variable y and two or more independent
variable y and two or more independent
variables (x1, x2, x3 . . . , xk)
variables (x1, x2, x3 . . . , xk)
Model:
y = ßo+ ß1x1+ ß2x2+ . . . ßkxk+e
^
y = b0 + b1x1 + b2x2 + . . . + bkxk
^
y = b0 + b1 x1+ b2 x2+ b3 x3 +. . .+ bk xk
Notation
ß0 = the y-intercept, or the value of y when all
(General form of the estimated multiple regression equation)
of the predictor variables are 0
n = sample size b0 = estimate of ß0 based on the sample data
k = number of independent variables ß1, ß2, ß3 . . . , ßk are the population coefficients
y^ = predicted value of the of the independent variables x1, x2, x3 . . . , xk
dependent variable y
b1, b2, b3 . . . , bk are the sample estimates of
x1, x2, x3 . . . , xk are the independent
the coefficients ß1, ß2, ß3 . . . , ßk
variables
140 140
120 120
100 C 100
C
80 R
80 R
i
60 60 i
M
M
40 E 40
E
20 20
0 0
0 20 40 60 80 100 0 20 40 60 80 100 120
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 2 24732 12366 28.54 <.0001
(R=.686)
Parameter Standard
Dependent Mean 52.40299 Adj R-Sq 0.4549
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 59.11807 28.36531 2.08 0.0411
Adjusted R2 Adjusted R2
2 (n - 1) 2
Adjusted R = 1 - (1 - R )
[n - (k + 1)]
2 (n - 1) 2
Adjusted R = 1 - (1 - R ) Sum of Mean
[n - (k + 1)] Source DF Squares Square F Value Pr > F
Parameter Estimates
2
R /k
Fk , n − k −1 = incom 1 -0.38309 0.94053 -0.41 0.6852
2
(1 − R ) /( n − k − 1) hs 1 -0.46729 0.55443 -0.84 0.4025
Individual Tests
• Test for b1: F Test for Restricted Models
Y = bo + b1x1 + b2x 2 + b3x3 + e
Y = bo + b2x 2 + b3x3 + e
• Test for b2: ( R 2f − Rr2 ) /(k − g )
Y = bo + b1x1 + b2x 2 + b3x3 + e Fk − g , n − k −1 =
Y = bo + b1x1 + b3x3 + e (1 − R 2f ) /(n − k − 1)
• Test for b3:
Y = bo + b1x1 + b2x 2 + b3x3 + e
Y = bo + b1x1 + b2x 2 + e
More on SAS Standardized Regression Weights
• Model options:
S xi
bi* = bi
Model y = x1 x2 / R partial p stb;
R– residual analysis
Sy
Partial– partial regression scatter plot
P– predicted values
Stb– standardized regression weights
Generally, the standardized regression
weights fall between 1 and –1. However,
they can larger than one (or less than –1).
Parameter Estimates
„ƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒ†
crate ‚ ‚
1
1 1
1
1 1 11
1
1 1
1
‚
‚
‚
‚
Crate-b1(inc)-b2(urb)
to the regression weight in the overall 0 ˆ
‚
‚
‚
1 1
1 1
1
1
1
1
1
1
‚
ˆ
‚
‚
model. -20 ˆ
‚
‚
‚ 1
1 1 1
1
1
1 1
1
2
1
11 1
1
2
1 1 1
1
1
‚
‚
‚
ˆ
‚ 2 1 1 ‚
• Model Statement ‚
‚
‚
‚
1
1
1
1
‚
‚
‚
‚
/ partial
-40 ˆ ˆ
‚ ‚
‚ ‚
‚ ‚
‚ ‚
‚ ‚
ŠƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒŒ
-14 -12 -10 -8 -6 -4 -2 0 2 4 6 8 10 12 14
hs
Hs-b1(inc)-b2(urb)
„ƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒ†
crate ‚ ‚
‚ ‚
-20 ˆ
‚
‚
1
1
1
1
1
1
1 1 1
1
‚
ˆ
‚
variable and the response changes as the levels of
‚
‚
‚
1
1
1 12 1
1
1 1
1 1 ‚
‚
‚
another variable changes.
-40 ˆ ˆ
‚
‚
‚
1 ‚
‚
‚
• Consider crime rate as a function of hs and urb. If
the relationship (slope) between crime rate and urb
‚ ‚
-60 ˆ ˆ
‚ ‚
Urb
between urb and hs.
Urb-b1(inc)-b2(hs)
• We test for interaction effect by comparing • Create a the product variable in the data
a model with interaction to a model without statement:
interaction: – Data new; input y x z; xz=x*z; cards;
Y = βo + β1x1 + β2x2 + β3 (x1*x2) + e
Y = βo + β1x1 + β2x2 + e – Model y = x z xz;
Parameter Estimates
Root MSE 20.82583 R-Square 0.4792
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Dependent Mean 52.40299 Adj R-Sq 0.4544 Intercept 1 19.31754 49.95871 0.39 0.7003
‚
‚
60 ˆ
‚
‚
40 ˆ A A A
‚ A
‚ A
‚
‚ A
‚ A A
R ‚ A A
e 20 ˆ A A
s ‚ A A A A
i ‚ A AA AA
d ‚ A A
u ‚ A
a ‚ A
l ‚ A
0 ˆ B
‚ A A
‚ A A A
‚ A A A A A A
‚ A A A A
‚ A A A A A A A
‚ A A A A A
-20 ˆ A A A A
‚ A
‚ A A A
‚
‚ A
‚ A A
‚
-40 ˆ
‚
Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒˆƒƒ
50 55 60 65 70 75 80 85
hs
Residual by hs
incom
Residual by Urb
Residual by income
SAS Setup Assumptions
• Linearity– the relationship between the
• proc reg; dependent variable and independent
• model crate= incom hs Urb; variables is linear.
• output out=new p=yhat r=yresid; • Normality– y is independently normally
• proc plot data=new; distributed
• plot yresid*hs; Independently distributed random errors with a
mean of zero.
• plot yresid*urb;
• Homoskedasticity– the conditional
• plot yresid*incom;run;
variances of y given x are all equal.
Collinearity Diagnostics
Parameter Estimates
Using the Multicollinearity
Indices
Parameter Standard Variance
• Look for conditioned indices larger than 30 Variable DF Estimate Error t Value Pr > |t| Inflation
• If an index is large than 30, identify Intercept 1 59.71473 28.58953 2.09 0.0408 0
VIF − 1
Ri2.rest =
VIF
Coeff Var 6.79794
VIF– the variance of the weight is inflated by
this quantity.
Collinearity Diagnostics
Test 1 Results for Dependent Variable crate • In a partial correlation a variable is partial
out of both variables
Mean rx1x2 . x3
Source DF Square F Value Pr > F
R12.23 − R12.3
Numerator 2 366.72473 0.84 0.4385
2
r12 .3 =
Denominator 63 439.00995 1 − R12.23
„ƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒ†
crate ‚ ‚
‚ ‚
Ry1y2.xz ‚
80 ˆ
‚ 1
‚
ˆ
‚
‚ ‚
‚ ‚
‚ ‚
60 ˆ ˆ
‚ ‚
‚ ‚
‚ 1 ‚
‚ ‚
40 ˆ 1 1 ˆ
‚ 1 1 1 ‚
‚ 11 ‚
‚ 1 1 ‚
‚ 1 1 ‚
20 ˆ
‚
‚
1 1 1
1
1 1
1
1
1 ˆ
‚
‚
Crate-b1(inc)-b2(hs)
y2 ‚
‚ 1
1
1 1
1 1
1 1
1 ‚
‚
y1
0 ˆ ˆ
‚ 1 1 1 1 1 ‚
‚ 1 1 1 1 11 ‚
‚ 1 1 ‚
‚ 1 1 1 1 1 ‚
-20 ˆ 1 1 ˆ
‚ 1 1 1 ‚
‚ 1 1 12 1 1 1 ‚
‚ 1 1 ‚
‚ 1 1 ‚
-40 ˆ ˆ
‚ 1 ‚
‚ ‚
z x ‚ ‚
z -60 ˆ
‚ ‚
ˆ
x
‚ ‚
‚ ‚
ŠƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒŒ
-60 -50 -40 -30 -20 -10 0 10 20 30 40 50 60
Urb
Urb-b1(inc)-b2(hs)