Chapter 14, Multiple Regression Using Dummy Variables
Chapter 14, Multiple Regression Using Dummy Variables
+ + + + =
Estimated
(or predicted)
value of Y
Estimated slope coefficients
Estimated
intercept
Three dimension
Y
X
1
X
2
Graph of a Two-Variable Model
2 2 1 1 0
X b X b b Y
+ + =
Example:
Simple Regression Results
Multiple Regression Results
Check the size and significance level of the
coefficients, the F-value, the R-Square, etc. You
will see what the net of effects are.
Coefficients Standard Error t Stat
Intercept (b
0
) 165.0333581 16.50316094 10.000106
Lotsize (b
1
)
6.931792143 2.203156234 3.1463008
F-Value 9.89
Adjusted R Square 0.108
Standard Error 36.34
Coefficients Standard Error t Stat
Intercept 59.32299284 20.20765695 2.935669
Lotsize 3.580936283 1.794731507 1.995249
Rooms 18.25064446 2.681400117 6.806386
F-Value 31.23
Adjusted R Square 0.453
Standard Error 28.47
Using The Equation to Make Predictions
Predict the appraised value at average lot size
(7.24) and average number of rooms (7.12).
What is the total effect from 2000 sf increase in lot
size and 2 additional rooms?
$215,180 or 215.18
) 18.25(7.12 (7.24) 3.58 59.32 . App.Val
=
+ + =
$43,660
(18.25)(2) 0) (3.58)(200
value app. in Increse
=
+ =
Coefficient of Multiple Determination, r
2
and Adjusted r
2
Reports the proportion of total variation in Y
explained by all X variables taken together (the
model)
Adjusted r
2
r
2
never decreases when a new X variable is added to the
model
This can be a disadvantage when comparing models
squares of sum total
squares of sum regression
SST
SSR
r
2
k .. 12 . Y
= =
What is the net effect of adding a new variable?
We lose a degree of freedom when a new X variable is added
Did the new X variable add enough explanatory power to offset
the loss of one degree of freedom?
Shows the proportion of variation in Y explained
by all X variables adjusted for the number of X
variables used
(where n = sample size, k = number of independent variables)
Penalize excessive use of unimportant independent
variables
Smaller than r
2
Useful in comparing among models
(
|
.
|
\
|
=
1 k n
1 n
) r 1 ( 1 r
2
k .. 12 . Y
2
adj
Multiple Regression Assumptions
Assumptions:
The errors are normally distributed
Errors have a constant variance
The model errors are independent
Errors (residuals) from the regression model:
e
i
= (Y
i
Y
i
)
These residual plots are used in multiple
regression:
Residuals vs. Y
i
Residuals vs. X
1i
Residuals vs. X
2i
Residuals vs. time (if time series data)
Two variable model
Y
X
1
X
2
2 2 1 1 0
X b X b b Y
+ + =
Y
i
Y
i
<
x
2i
x
1i
The best fit equation, Y ,
is found by minimizing the
sum of squared errors, Ee
2
<
Sample
observation
Residual = e
i
= (Y
i
Y
i
)
<
Are Individual Variables Significant?
Use t-tests of individual variable slopes
Shows if there is a linear relationship between the
variable X
i
and Y; Hypotheses:
H
0
:
i
= 0 (no linear relationship)
H
1
:
i
0 (linear relationship does exist between X
i
and Y)
Test Statistic:
Confidence interval for the population slope
i
i
b
i
1 k n
S
0 b
t
=
i
b 1 k n i
S t b
=
Using Dummy Variables
A dummy variable is a categorical
explanatory variable with two levels:
yes or no, on or off, male or female
coded as 0 or 1
Regression intercepts are different if the
variable is significant
Assumes equal slopes for other variables
If more than two levels, the number of
dummy variables needed is (number of
levels - 1)
Different Intercepts, same slope
Y (sales)
b
0
+ b
2
b
0
1 0 1 0
1 2 0 1 0
X b b (0) b X b b Y
X b ) b (b (1) b X b b Y
1 2 1
1 2 1
+ = + + =
+ + = + + =
Fire Place
No Fire Place
If H
0
:
2
= 0 is
rejected, then
Fire Place has a
significant effect
on Values
Interaction Between Explanatory
Variables
Hypothesizes interaction between pairs of X variables
Response to one X variable may vary at different levels of
another X variable
Contains two-way cross product terms
Effect of Interaction
Without interaction term, effect of X
1
on Y is measured by
1
With interaction term, effect of X
1
on Y is measured by
1
+
3
X
2
Effect changes as X
2
changes
) (X b X b X b b
X b X b X b b Y
2 1 3 2 2 1 1 0
3 3 2 2 1 1 0
X + + + =
+ + + =
Example: Suppose X2 is a dummy variable
and the estimated regression equation is
Slopes are different if the effect of X
1
on Y depends on X
2
value
X
1
0 1 0.5 1.5
Y
= 1 + 2X
1
+ 3X
2
+ 4X
1
X
2
Y