Professional Documents
Culture Documents
Chap 013
Chap 013
Chapter 13
Multiple Regression
13.1 a. ŷ = 4.306 − 0.082ShipCost + 2.265PrintAds + 2.498WebAds + 16.697Rebate%
b. The coefficient of ShipCost says that each additional $1 of shipping cost reduces
about$ 0.082 from net revenue.
The coefficient of PrintAds says that each additional $1000 of printed ads adds about
$2,265 to net revenue.
The coefficient of WebAds says that each additional $1000 of printed ads adds about
$2,498 to net revenue.
The coefficient of Rebate% says that each additional percentage in the rebate rate
adds about $16,700 to net revenue.
c. The intercept is meaningless. You have to supply some product, so shipping cost
can’t be zero. You don’t have to have a rebate or ads, they can be zero.
d. NetRevenue = 4.306 − 0.082(10)+ 2.265(50) + 2.498(40) + 16.697(15) = 467
thousands or $467,111.
Learning Objective: 13-1
13-1
Chapter 13 - Multiple Regression
13-2
Chapter 13 - Multiple Regression
13.9 a.
bj 0
tcalc , p-value =T.DIST.2T(|tcalc|, 45)
sj
Predictor Coef, bj sj tcalc p-value
Intercept 0.0608585
0.9517414
ShipCost -0.0175289
0.9860922
PrintAds 2.1571429
0.0363725
2.9537661
0.0049772
Rebate% 4.6770308
b. t.005 =T.INV(.005,45) =±2.69. Web Ads and Rebate% differ significantly from zero (p-
value < .01 and tcalc >2.69.)
c. See table in part a.
Learning Objective: 13-3
13.10 a.
bj 0
tcalc , p-value =T.DIST.2T(|tcalc|, 26)
sj
13.11 a.
bj 0
tcalc , p-value =T.DIST.2T(|tcalc|, 497)
sj
13-3
Chapter 13 - Multiple Regression
13.12 a.
bj 0
tcalc , p-value =T.DIST.2T(|tcalc|, 45)
sj
13.13 Use ŷi ± t/2sewith 34 df, t.025 =T.INV(.025,34) = 2.032. (Use the positive value.)
Half width of 95% prediction interval = 2.032(3620) =7355.84
Using the quick rule the half width =2se = 2(3620) = 7240
Yes, the quick rule gives similar results.
Learning Objective: 12-9
13.14 Use ŷi ± t/2se with 20df, t.025 =T.INV(.025,34) = 2.086. (Use the positive value.)
Half width of 95% prediction interval =2.086 (1.17) =2.4406
Using the quick rule the half width = 2se = 2(1.17) = 2.34
Yes, the quick rule gives similar results.
Learning Objective: 12-9
13.15 a. Number of nights (NumNight) needed and number of bedrooms (NumBedrooms) are
both discrete.
b. Two: SwimPool = 1 if there is a swimming pool and ParkGarage = 1 if there is a
parking garage
c. CondoPrice = 0 + 1NumNights + 2NumBedrooms + 3SwimPool + 4ParkGarage
Learning Objective: 13-3
Learning Objective: 13-5
13-4
Chapter 13 - Multiple Regression
13-5
Chapter 13 - Multiple Regression
13.19 a. The scatter plot shows an obvious increasing trend but it is nonlinear rather than
linear. The increase in salary is much steeper in the earlier years than in the later
years. A nonlinear model would be appropriate.
b. MegaStat Output is below: R2 = .915, Fcalc =194.99, p-value = .0000. Yes, the model is
significant.
Regression Analysis
R² 0.915
Adjusted R² 0.911 n 39
R 0.957 k 2
Std. Error 8.757 Dep. Var. Salary ($1000)
ANOVA
table
Source SS df MS F p-value
29,901.972
Regression 8 2 14,950.9864 194.99 4.84E-20
Residual 2,760.3861 36 76.6774
32,662.359
Total 0 38
13-6
Chapter 13 - Multiple Regression
c. Years: p-value = .0000, YearsSq: p-value = .0000. Both of these predictors are
significant.
Learning Objective: 13-3
13.20 MegaStat Output is below: Male: p-value = .5009, YearsxMale: p-value = .0505. The
binary variable Male is not significant. The interaction variable YearsxMale is
significant because the p-value is less than .10. The coefficient on the interaction term
is positive which means that as men gain more years of experience their salaries tend
to be higher than females.
Regression Analysis
R² 0.945
Adjusted R² 0.939 n 39
R 0.972 k 4
Std. Error 7.269 Dep. Var. Salary ($1000)
ANOVA
table
Source SS df MS F p-value
30,865.754
Regression 2 4 7,716.4386 146.03 6.59E-21
Residual 1,796.6048 34 52.8413
32,662.359
Total 0 38
13-7
Chapter 13 - Multiple Regression
13.21 a. All but one pair of variables is significantly correlated at α = .01. See the matrix
below. LiftOps and Scanners (r = .635), Crowds and LiftWait (r = .577), AmountGr
and TrailGr (r = .531), SkiSafe and SpSeen (r = .488)
Correlation Matrix
scanners liftops liftwait trailv snosurf crowds amountgr trailgr skisafe spseen hosts
scanners 1.000
liftops .635 1.000
liftwait .146 .180 1.000
1.00
trailv .115 .206 .128 0
snosurf .190 .242 .227 .373 1.000
crowds .245 .299 .577 .235 .348 1.000
amountgr .245 .271 .251 .221 .299 .372 1.000
1.00
trailgr .266 .337 .205 .360 .358 .362 .531 0
skisafe .200 .306 .196 .172 .200 .332 .274 .323 1.000
spseen .145 .190 .207 .172 .184 .230 .149 .172 .488 1.000
1.00
hosts .245 .278 .046 .140 .119 .133 .128 .156 .212 .350 0
502sample size
13-8
Chapter 13 - Multiple Regression
Al Si Cr Ti Zn Pb
Al 1.000
Si .456 1.000
Cr .133 -.073 1.000
Ti .389 .278 .011 1.000
Zn .286 .365 .529 -.083 1.000
Pb -.202 -.053 -.114 -.345 .180 1.000
33sample size
variables VIF
Intercept
Al 1.500
Si 1.674
Cr 1.761
Ti 1.390
Zn 2.197
Pb 1.281
2(k + 1)
13.23 If hi> then the observation is considered to be a high leverage observation.
n
2(5 + 1)
a. = .1667 . hi = .15 < .1667 therefore this is not a high leverage observation.
72
2(4 + 1)
b. = .10 . hi = .18 > .10 therefore this is a high leverage observation.
100
2(7 + 1)
c. = .0667 . hi = .08 > .0667 therefore this is a high leverage observation.
240
Learning Objective:13-8
13.24 Assumption #1: Residuals are normally distributed. It appears this assumption has been
violated because the histogram shows a skewed left distribution. Because the data set
is fairly small the normplot is not as useful for detecting non-normality.
13-9
Chapter 13 - Multiple Regression
Assumption #2: Residuals have constant variance. It appears this assumption has been
violated. The residuals plotted against the predicted Y values show a fan out pattern
which indicates heteroscedasticity or non-constant variance.
Assumption #3: Residuals are independent. It appears this assumption has been violated.
Applying the runs test (see section 12.8) to the Runs plot we see there are 10 crossing
points. If autocorrelation did not exist we would expect approximately 15/2 or 7 to 8
crossings. We have more than 8 crossings so negative autocorrelation is a concern.
Questions 13.25 through 13.41 refer to 10 different data sets labeled A-J. The answers to each question
are listed for each data set in turn.
13.26 The variable magnitudes are not too different. Weight is approximately 100 times the
magnitude of the other variables but this should not cause problems in the analysis.
Learning Objective: 13-9
13.27 The intercept would not have meaning. It would not be logical to have a car with zero
values for any of the predictors. A priori reasoning for the relationship between each
predictor and the response variable are listed in the table below.
Predictor Relationship with Reason?
Response
Length Negative Bigger size, lower mileage
Width Negative Bigger size, lower mileage
Weight Negative Bigger size, lower mileage
Japanese cars have a reputation for better
Japan Positive mileage
Learning Objective: 12-2
13.28 43/4 = 10.75 > 10. The data set meets both Evans’ and Doane’s Rules.
Learning Objective: 13-1
13-10
Chapter 13 - Multiple Regression
MegaStat Output:
Regression Analysis
R² 0.703
Adjusted R² 0.671 n 43
R 0.838 k 4
Std. Error 2.505 Dep. Var. City
ANOVA table
Source SS df MS F p-value
Regression 563.9264 4 140.9816 22.46 1.40E-09
Residual 238.5387 38 6.2773
Total 802.4651 42
Regression output confidence interval
variables coefficients std. error t (df=38) p-value 95% lower 95% upper
Intercept 43.9932 8.4767 5.190 7.33E-06 26.8330 61.1534
Length -0.0039 0.0445 -0.087 .9311 -0.0939 0.0862
Width -0.1064 0.1395 -0.763 .4501 -0.3888 0.1759
Weight -0.0041 0.0008 -4.955 1.53E-05 -0.0058 -0.0024
Japan -1.3228 0.8146 -1.624 .1127 -2.9718 0.3262
Learning Objective: 13-1
Learning Objective: 13-3
13.30 Refer to the output in the question 13.29. The coefficient confidence intervals contain zero
except for the variable Weight. This means that Weight is the only significant
predictor in the model (at a significance level of .05.)
Learning Objective: 13-4
13.31 For each coefficient test the following hypotheses: H0: βj = 0 vs. H1: βj ≠ 0. Using df = 38,
t.025 =T.INV(.025, 38) = ±2.024. Only Weight has a significant result with tcalc =
4.955< −2.024
Learning Objective: 13-3
13.33 Fcalc = 22.46 with a p-value = 1.40E-09. R2 = .703 and R2adj = .671. The model provides
significant fit with a fairly strong prediction of city mileage.
Learning Objective: 13-2
13.34 Prediction interval: yˆi ± t.025se = yˆi ± 2.024(2.505) = yˆi ± 5.07. Yes, this model does have
practical value.
Learning Objective: 12-9
13-11
Chapter 13 - Multiple Regression
13.35 a.
Correlation Matrix
43sample size
b. Both length and width are significantly correlated with Weight. Collinearity could be a
problembut according to Klein’s Rule we shouldn’t be overly concerned. (Both .72
and .753 are less than .703 = .838 .)
Learning Objective: 13-6
13.36 a.
variables VIF
Intercept
Length 2.672
Width 2.746
Weight 2.907
Japan 1.106
b. The VIF values are all under 3 which suggest that multicollinearity has not caused
instability. In fact, both length and width turned out to be insignificant in the model
which is what one would expect.
Learning Objective: 13-6
13.37 Vehicle 42, the Jetta, was the only observation that had an outlier residual.
Learning Objective: 13-8
13.38 Observation 2, 8, 13, and 21 had high leverage values. To determine if any of the
observations were influential, remove them from the data set and rerun the regression.
If the regression statistics change significantly then the observation could be
considered influential.
Learning Objective: 13-8
13-12
Chapter 13 - Multiple Regression
13.39 Assuming normally distributed residuals appears reasonable with a high outlier. Although
running a normality test of the residuals (A-D and moment tests) results in a failure;
more than likely because of the outlier.
13-13
Chapter 13 - Multiple Regression
13.40 Assuming homoscedastic residuals appears reasonable. We see the one high outlier on the
plot.
13-14
Chapter 13 - Multiple Regression
13.27 The intercept would not have meaning. It would not be logical to have a restaurant with
zero values for any of the predictors. A priori reasoning for the relationship between
each predictor and the response variable are listed in the table below.
Predictor Relationship with Reason?
Response
Seats- The larger the size of the restaurant, the
Inside greater the sales.
Positive
Seats- The larger the size of the restaurant, the
Patio greater the sales.
Positive
MedInco The higher the income of the potential
me customers, the higher the sales
Positive
MedAge The older the potential
Positive customers, the higher the sales
BachDeg More education would be positively
% correlated with higher income and therefore
Positive higher the sales
Learning Objective: 12-2
13.28 74/5 = 14.8 > 10. The data set meets both Evans’ and Doane’s Rules.
Learning Objective: 13-1
R² 0.233
Adjusted R² 0.177 n 74
R 0.483 k 5
Std. Error 124.529 Dep. Var. Sales/SqFt
ANOVA table
Source SS df MS F p-value
Regression 320,276.8169 5 64,055.3634 4.13 .0025
Residual 1,054,515.7777 68 15,507.5850
Total 1,374,792.5946 73
13-15
Chapter 13 - Multiple Regression
13-16
Chapter 13 - Multiple Regression
13.30 Refer to the output in question 13.29. The 95% coefficient confidence intervals for the
variables Seats-Inside, Seats-Patio, MedIncome, and MedAge all contain zero. Only
BachDeg% is a significant predictor in the model at a significance level of .05.
Learning Objective: 13-4
13.31 For each coefficient test the following hypotheses: H0: βj = 0 vs. H1: βj ≠ 0. Using df = 68,
t.025 =T.INV(.025,68) =±1.995. Only BachDeg% has a significant result at = .05
with tcalc = 3.307> 1.995.
Learning Objective: 13-3
13.32 a. BachDeg%: p-value = .0015 < .05. Note that both MedIncome and Seats-Inside are
both significant at = .10.
b. This is consistent with the answer in 13.31.
c. The tests conclude the same thing. Most analysts prefer the p-value approach because
it tells the strength of the predictor significance.
Learning Objective: 13-3
13.33 Fcalc = 4.13 with a p-value = .0025. R2 = .233 and R2adj = .177. The model provides
significant fit but does not provide strong prediction of restaurant sales/sqft.
Learning Objective: 13-2
13.34 Prediction interval: yˆi ± t.025se = yˆi ± 1.995(124.529) = yˆi ± 248.4354. No, this model does
not have practical value because the width of the interval is so wide.
Learning Objective: 12-9
13-17
Chapter 13 - Multiple Regression
13.35 a.
74sample size
13.36 a.
variables VIF
Intercept
Seats-Inside 1.045
Seats-Patio 1.039
MedIncome 1.807
MedAge 1.267
BachDeg% 1.584
b. The VIF values are all under 2 which suggests that multicollinearity has not caused
instability.
Learning Objective: 13-6
13.37 Observations 6, 19, 22, 46, and 69 would be considered unusual or outlier residuals.
Learning Objective: 13-8
13.38 Observation 14, 19, 23, and 69 had high leverage values. To determine if any of the
observations were influential, remove them from the data set and rerun the regression.
If the regression statistics change significantly then the observation could be
considered influential.
Learning Objective: 13-8
13-18
Chapter 13 - Multiple Regression
13-19
Chapter 13 - Multiple Regression
13.40 Assuming homoscedastic residuals appears reasonable. There is no obvious fan out or
funnel pattern in the plot below.
13-20
Chapter 13 - Multiple Regression
13.26 The variable magnitudes are not too different. Floor space is approximately 1000 times the
magnitude of the other predictor variables but is similar to the response variable.
Learning Objective: 13-9
13.27 The intercept would not have meaning. It would not be logical to have a building with zero
values for any of the predictors. A priori reasoning for the relationship between each
predictor and the response variable are listed in the table below.
Predictor Relationship with Reason?
Response
Floor Positive Bigger size, higher value
Offices Positive More offices, higher value
Entrances Positive More entrances, higher value
Increase in age means more maintenance,
Age Negative lower value
Freeway Positive Closer access to freeway, higher value
Learning Objective: 12-2
13.28 32/5 = 6.5. The data set meets Doane’s Rule but not Evans’.
Learning Objective: 13-1
R² 0.967
Adjusted R² 0.961 n 32
R 0.983 k 5
Std. Error 90.189 Dep. Var. Assessed
ANOVA table
Source SS df MS F p-value
Regression 6,225,261.2561 5 1,245,052.2512 153.07 2.01E-18
Residual 211,486.6189 26 8,134.1007
Total 6,436,747.8750 31
13-21
Chapter 13 - Multiple Regression
13.30 Refer to the output in question 13.29. The coefficient confidence intervals contain zero for
the variables Entrances and Age. This means that Floor, Offices, and Freeway are
significant predictors in the model.
Learning Objective: 13-4
13.31 For each coefficient test the following hypotheses: H0: βj = 0 vs. H1: βj ≠ 0. Using df = 26,
t.025 =T.INV(.025, 26) =±2.056. Floor, Offices, and Freeway both have significant
results with tcalc = 11.494, 3.175, and 3.341, respectively (both are greater than 2.056.)
Learning Objective: 13-3
13.33 Fcalc = 153.07 with a p-value = 2.01E-18. R2 = .967 and R2adj = .961. The model provides
significant fit with a very strong prediction of building assessed value.
Learning Objective: 13-2
13.34 Prediction interval: yˆi ± t.025se = yˆi ± 2.056(90.189) = yˆi ± 185.429. Yes, this model does
have practical value. The prediction interval width should provide valuable
information.
Learning Objective: 12-9
13-22
Chapter 13 - Multiple Regression
13.35 a.
32sample size
b. Both Offices and Entrances are significantly correlated with Floor. Collinearity is
most likely not a problem according to Klein’s Rule. The correlation coefficient
values are less than .967.
Learning Objective: 13-6
13.36 a.
variables VIF
Intercept
Floor 3.757
Offices 3.267
Entrances 1.638
Age 1.169
Freeway 1.185
b. The VIF values are all under 4 which suggests that multicollinearity has not caused
instability.
Learning Objective: 13-6
13.37 Building 5 was the only observation that had an unusual residual.
Learning Objective: 13-8
13.39 The histogram shows a slight left skewed distribution but is unimodal with no obvious
outliers. The normplot is a fairly straight line on the diagonal. Assuming normally
distributed residuals appears reasonable.
13-23
Chapter 13 - Multiple Regression
13-24
Chapter 13 - Multiple Regression
13.40 Assuming homoscedastic residuals appears reasonable. The residual plot does not show a
fan out or funnel pattern.
13-25
Chapter 13 - Multiple Regression
13.27 The intercept would not have meaning. While it might be possible to have 0% change in
currency demands and deposits, it would not be logical to have a zero unemployment
rate or zero utilization of manufacturing capacity. A priori reasoning for the
relationship between each predictor and the response variable are listed in the table
below.
Predictor Relationship with Reason?
Response
CapUtil Positive Greater utilization, increase in CPI
ChgM1 Negative Increase in deposits, CPI stays stable
ChgM2 Positive Increase in small deposits, CPI increases
Unem Positive Unemployment increases, CPI increases
Learning Objective: 12-2
13.28 41/4 = 10.25. The data set meets both Evans’ and Doane’s Rules.
Learning Objective: 13-1
R² 0.225
Adjusted R² 0.139 n 41
R 0.474 k 4
Std. Error 2.623 Dep. Var. ChgCPI
ANOVA table
Source SS df MS F p-value
Regression 71.8691 4 17.9673 2.61 .0514
Residual 247.5957 36 6.8777
Total 319.4649 40
13-26
Chapter 13 - Multiple Regression
13.30 Refer to the output in question 13.29. The 95% coefficient confidence intervals for ChgM1
and ChgM2 both contain zero. This means that neither of these predictors is
significant at = .05. The confidence intervals for CapUtil and Unem do not contain
zero so both predictors are significant at a .05 level.
Learning Objective: 13-4
13.31 For each coefficient test the following hypotheses: H0: βj = 0 vs. H1: βj ≠ 0. Using df = 36,
t.025 =T.INV(.025, 36) =±2.028. CapUtil has a tcalc = 2.231 and Unemp has a tcalc =
2.572, which are both greater than 2.028 so this indicates significance.
Learning Objective: 13-3
13.32 a. CapUtil: p-value = .0320 and Unempp-value = .0144. Both are significant at = .05.
b. This is consistent with the answer in 13.31.
c. The tests conclude the same thing. Most analysts prefer the p-value approach because
it tells the strength of the predictor significance.
Learning Objective: 13-3
13.33 Fcalc = 2.61 with a p-value = .0514. R2 = .225 and R2adj = .139. The model does not provide
significant fit at = .05.
Learning Objective: 13-2
13.34 Prediction interval: yˆi ± t.025se = yˆi ± 2.028(2.623) = yˆi ± 5.319. Based on the response to
question 13.33 and the wide prediction interval,no, this model does not have practical
value.
Learning Objective: 12-9
13-27
Chapter 13 - Multiple Regression
13.35 a.
41sample size
13.36 a.
variables VIF
Intercept
CapUtil 1.785
ChgM1 1.420
ChgM2 1.171
Unem 2.192
b. The VIF values are all under 3 which suggests that multicollinearity has not caused
instability.
Learning Objective: 13-6
13.37 1974, 1979, and 1980 were years that had unusual or outlier residuals.
Learning Objective: 13-8
13.38 1992 and 2001 were high leverage years. To determine if any of the observations were
influential, remove them from the data set and rerun the regression. If the regression
statistics change significantly then the observation could be considered influential.
Learning Objective: 13-8
13-28
Chapter 13 - Multiple Regression
13.39 The normplot is not as straight of a line as one would like to see. The histogram of
residuals is right skewed with several possible outliers. Assuming normally
distributed residuals is questionable.
13-29
Chapter 13 - Multiple Regression
13.40 Assuming homoscedastic residuals appears reasonable although one might question the
slight increase in residual magnitude for the predictions of greater positive change.
13.41 A test for autocorrelation is warranted. DW = 0.75 which suggests positive autocorrelation.
13-30
Chapter 13 - Multiple Regression
13.26 The variable magnitudes are not too different. Education spending by state is
approximately 100 times the magnitude of the other variables but this should not
cause a problem.
Learning Objective: 13-9
13.27 The intercept would not have meaning. None of the quantitative predictor variables would
logically be zero. A priori reasoning for the relationship between each predictor and
the response variable are listed in the table below.
Predictor Relationship with Reason?
Response
Higher dropout rate, lower college graduation
Dropout Negative rate
Higher spending, higher college graduation
EdSpend Positive rate
Greater urban population, higher college
Metro% Positive graduation rate
Age Negative Older population, fewer attending college
More women in workforce, more college
LPRFem Positive graduates
With Midwest as the base, more college
Neast Positive graduates in the northeast
With Midwest as the base, more college
Seast Positive graduates in the southeast
With Midwest as the base, more college
West Positive graduates in the west
Learning Objective: 12-2
13.28 50/8 = 6.25> 5. The data set meets Doane’s Rule but does not meet Evans’.
Learning Objective: 13-1
R² 0.692
Adjusted
R² 0.632 n 50
R 0.832 k 8
Std. Error 3.099 Dep. Var. ColGrad%
13-31
Chapter 13 - Multiple Regression
ANOVA
table
Source SS df MS F p-value
Regression 885.4526 8 110.6816 11.53 2.16E-08
Residual 393.7026 41 9.6025
1,279.155
Total 2 49
13.30 Refer to the output in question 13.29. The coefficient confidence intervals contain zero
except for the variables Metro%, LPRFem, and Neast. These three variables are the
only predictors significant at = .05.
Learning Objective: 13-4
13.31 For each coefficient test the following hypotheses: H0: βj = 0 vs. H1: βj ≠ 0. Using df = 41,
t.025 =T.INV(.025,41) = ±2.020. The t statistics for Metro%, LPRFem, and Neast are
3.621, 2.913, and 3.366, respectively. Each value is greater than 2.02.
Learning Objective: 13-3
13.32 a. Metro%: p-value =.0008, LPRFem: p-value = .0058, and Neast: p-value = .0017.
b. This is consistent with the answer in 13.31.
c. The tests conclude the same thing. Most analysts prefer the p-value approach because
it tells the strength of the predictor significance.
Learning Objective: 13-3
13.33 Fcalc = 11.53 with a p-value = 2.16E-09. R2 = .692 and R2adj = .632. The model provides
significant fit with a fairly strong prediction of state college graduation rates.
Learning Objective: 13-2
13.34 Prediction interval: yˆi ± t.025se = yˆi ± 2.02(3.099) = yˆi ± 6.26. Yes, this model does have
practical value.
13-32
Chapter 13 - Multiple Regression
13.35 a.
50 sample size
b. The only two variables that have an r value that might raise a red flag are LPRFem
and Dropout. However this should not be a problemaccording to Klein’s Rule because
.741< .692 = .832.
Learning Objective: 13-6
13.36 a.
variables VIF
Intercept
Dropout 2.637
EdSpend 1.447
Metro% 1.684
AgeMed 1.886
LPRFem 3.174
Neast 2.182
Seast 2.827
West 1.890
b. The VIF values are all 3 or less (except for LPRFEM which is slightly greater than 3)
which suggests that multicollinearity has not caused instability.
Learning Objective: 13-6
13.37 Delaware and Wyoming are the only two states that showed unusual or outlier residuals.
Learning Objective: 13-8
13.38 Utah and West Virginia had high leverage values. To determine if any of the observations
were influential, remove them from the data set and rerun the regression. If the
regression statistics change significantly then the observation could be considered
influential.
Learning Objective: 13-8
13-33
Chapter 13 - Multiple Regression
13.39 Although the normplot is not as straight of a line as one would like and the histogram is
slightly skewed to the left, the distribution is unimodel with tapering tails and there
are no obvious outliers. Assuming normally distributed residuals appears reasonable.
13-34
Chapter 13 - Multiple Regression
13.40 Assuming homoscedastic residuals is not reasonable. There is a clear fan out pattern. It
appears that the residual variation increases as the percentage of a state’s population
living in metropolitan areas increases. Taking a log transform of the dependent
variable would not be a good solution here because the magnitudes of the variables
are similar. There may be lurking variables we could add to the model to correct this
problem.
13-35
Chapter 13 - Multiple Regression
13.27 The intercept would not have meaning. It would not be logical to have an aircraft with zero
values for any of the predictors. A priori reasoning for the relationship between each
predictor and the response variable are listed in the table below. Note that the
variable Age is calculated by subtracting the year of manufacture from 2010.
Predictor Relationship with Reason?
Response
Age Negative Older engine, lower speed
TotalHP Positive Bigger engine, higher speed
NumBlades Positive More blades, higher speed
Turbo Positive Stronger engine, higher speed
Learning Objective: 12-2
13.28 55/4 = 13.75. The data set meets both Evans’ and Doane’s Rules.
Learning Objective: 13-1
13-36
Chapter 13 - Multiple Regression
13.30 Refer to the output in question 13.29. The coefficient confidence intervals contain zero
except for the variables TotalHP and Turbo. Only TotalHP and Turbo are significant
predictors at = .05.
Learning Objective: 13-4
13.31 For each coefficient test the following hypotheses: H0: βj = 0 vs. H1: βj ≠ 0. Using df = 50,
t.025 =T.INV(.025, 50) =±2.009. The predictor TotalHP has a tcalc = 9.167 and Turbo
has a tcalc = 2.537 which are both greater than 2.009 therefore we reject the null
hypotheses and conclude their coefficients are not equal to zero.
Learning Objective: 13-3
13.32 a. TotalHP: p-value = 2.76E-12 and Turbo: p-value =.0143. Both p-values are less than .
05. Note that the variable Age has a p-value = .0541 which is significant at = .10
b. This is consistent with the answer in 13.31.
c. The tests conclude the same thing. Most analysts prefer the p-value approach because
it tells the strength of the predictor significance.
Learning Objective: 13-3
13.33 Fcalc = 41.40 with a p-value = 2.75E-15. R2 = .768 and R2adj = .750. The model provides
significant fit with a fairly strong prediction of aircraft cruising speed.
Learning Objective: 13-2
13.34 Prediction interval: yˆi ± t.025se = yˆi ± 2.009(18.097) = yˆi ± 36.357. Yes, this model does
have practical value.
Learning Objective: 12-9
13.35 a.
NumBlade
TotalHP s Turbo Age
TotalHP 1.000
NumBlade
s .491 1.000
Turbo .096 .388 1.000
Age .154 -.180 -.030 1.000
sample
55 size
13-37
Chapter 13 - Multiple Regression
b. The VIF values are all under 2 which suggest that multicollinearity has not caused
instability.
Learning Objective: 13-6
13.38 Observations 3, 8, 43, and 46 have high leverage values. To determine if any of the
observations were influential, remove them from the data set and rerun the regression.
If the regression statistics change significantly then the observation could be
considered influential.
Learning Objective: 13-8
13-38
Chapter 13 - Multiple Regression
13-39
Chapter 13 - Multiple Regression
13.27 The intercept would not have meaning. It would not be logical to for a particular compound
to have zero values for any of the predictors. A priori hypotheses for the relationship
between each predictor and the response variable are listed in the table below.
Predictor Relationship with
Response
MW Negative
BP Positive
RI Negative
H1 Negative
H2 Negative
H3 Negative
H4 Negative
Learning Objective: 12-2
13.28 35/7 = 5. The data set meets Doane’s Rule but not Evans’.
Learning Objective: 13-1
13-40
Chapter 13 - Multiple Regression
MegaStat Output:
Regression Analysis
R² 0.987
Adjusted R² 0.983 n 35
R 0.993 k 7
Std. Error 8.571 Dep. Var. Ret
ANOVA table
Source SS df MS F p-value
Regression 146,878.2005 7 20,982.6001 285.64 1.27E-23
Residual 1,983.3648 27 73.4580
Total 148,861.5653 34
13.30 Refer to the output in question 13.29. The coefficient confidence intervals contain zero
except for the variable BP (Boiling Point). Only BP is a significant predictor at = .
05.
Learning Objective: 13-4
13.31 For each coefficient test the following hypotheses: H0: βj = 0 vs. H1: βj ≠ 0. Using df = 27,
t.025 =T.INV(.025,27) =±2.052. The predictor BP has a tcalc = 8.139.
Learning Objective: 13-3
13.33 Fcalc = 285.64 with a p-value = 1.27E-23. R2 = .987 and R2adj = .983. The model provides
significant fit with a fairly strong prediction of compound retention time. However,
residual assumptions should be verified.
Learning Objective: 13-2
13-41
Chapter 13 - Multiple Regression
13.34 Prediction interval: yˆi ± t.025se = yˆi ± 2.052(8.571) = yˆi ± 17.588 Yes, this model does have
practical value, provided the residual assumptions are verified.
Learning Objective: 12-9
13.35 a.
MW BP RI H1 H2 H3 H4
MW 1.000
BP .906 1.000
RI .240 .580 1.000
H1 .065 -.218 -.747 1.000
H2 -.233 -.336 -.202 -.194 1.000
H3 -.214 -.185 -.145 -.194 -.094 1.000
H4 .117 .167 .202 -.316 -.153 -.153 1.000
35sample size
13.36 a.
variables VIF
Intercept
MW 21.409
BP 31.113
RI 13.115
H1 9.235
H2 2.816
H3 2.458
H4 1.793
b. The VIF values MW, BP, RI, and H1 are all high. It is possible that they are causing
variance inflation which could cause some instability. If the regression analysis is
rerun using only BP as the predictor variable, very little changes in the fit statistics
and the scatterplot shows a very strong linear relationship.
Learning Objective: 13-6
13-42
Chapter 13 - Multiple Regression
13.38 Observation 24 has a high leverage value. To determine if any of the observations were
influential, remove them from the data set and rerun the regression. If the regression
statistics change significantly then the observation could be considered influential.
Learning Objective: 13-8
13.39 While the normplot and histogram are not a perfect representation of a normal distribution,
there are no strong departures from normality. Assuming normally distributed
residuals appears reasonable.
13-43
Chapter 13 - Multiple Regression
13.40 The model appears to underestimate retention at the low end and the high end of the range
of values. It might be prudent to explore a nonlinear relationship with retention and
BP.
13-44
Chapter 13 - Multiple Regression
13.27 The intercept would not have meaning. It would not be logical to for a particular state to
have zero values for any of the predictors. A priori hypotheses for the relationship
between each predictor and the response variable are listed in the table below.
Predictor Relationship with
Response
MassLayof Positive
SubprimeShare Positive
PriceIncomeRat
o Positive
Homeownership Negative
5YrApp Positive
UnempChange Positive
%HousMoved Positive
Learning Objective: 12-2
13.28 50/7 = 7.14. The data set meets Doane’s Rule but not Evans’.
Learning Objective: 13-1
13-45
Chapter 13 - Multiple Regression
MegaStat Output:
Regression Analysis
R² 0.739
Adjusted R² 0.696 n 50
R 0.860 k 7
Std. Error 3.732 Dep. Var. Foreclosure
ANOVA table
Source SS df MS F p-value
Regression 1,657.8345 7 236.8335 17.01 2.01E-10
Residual 584.9417 42 13.9272
Total 2,242.7762 49
13.30 Refer to the output in question 13.29. The coefficient confidence intervals contain zero
except for the variables UnempChange and %HousMoved. These variables were the
only two significant at = .05.
Learning Objective: 13-4
13.31 For each coefficient test the following hypotheses: H0: βj = 0 vs. H1: βj ≠ 0. Using df = 42,
t.025 =T.INV(.025,42) =±2.018. The predictors UnempChange and %HousMoved have
tstatistics equal to 2.132 > 2.018 and 6.912<2.018, respectively.
Learning Objective: 13-3
13.33 Fcalc = 17.01 with a p-value = 2.01E-10. R2 = .739 and R2adj = .696. The model provides
significant fit with a fairly strong prediction of foreclosure rates. However, residual
assumptions should be verified.
13-46
Chapter 13 - Multiple Regression
13.34 Prediction interval: yˆi ± t.025se = yˆi ± 2.018(3.732) = yˆi ± 7.531. Yes, this model does have
practical value, provided the residual assumptions are verified.
Learning Objective: 12-9
13.35 a.
Subprime PriceIncom Home Unemp %Hous
MassLayoff Share Ratio ownership 5YrApp Change Moved
MassLayoff 1.000
SubprimeShare -.022 1.000
PriceIncomeRatio .073 -.119 1.000
Homeownership -.045 .130 -.501 1.000
5YrApp .045 -.105 .786 -.434 1.000
UnempChange .280 .124 .200 -.066 .211 1.000
%HousMoved -.144 -.507 -.314 .229 -.119 -.143 1.000
50sample size
b. Collinearity might be a problem. There are several pairs of variables that have
significant correlation. In particular 5YrApp and PriceIncomRatio have an rclose in
value to .739 = .860.
Learning Objective: 13-6
13.36 a.
variables VIF
Intercept
MassLayoff 1.123
SubprimeShare 1.653
PriceIncomeRatio 3.447
Homeownership 1.405
5YrApp 2.868
UnempChange 1.170
%HousMoved 1.901
b. The VIF values are all 4 or less. Concern about multicollinearity is not high.
Learning Objective: 13-6
13-47
Chapter 13 - Multiple Regression
13.38 California, Mississippi, Nevada, and Vermont have high leverage values. To determine if
any of the observations were influential, remove them from the data set and rerun the
regression. If the regression statistics change significantly then the observation could
be considered influential.
Learning Objective: 13-8
13.39 Residuals do not appear to be normally distributed. A histogram shows a right skewed
distribution.
13-48
Chapter 13 - Multiple Regression
13.40 The residual plot shows some indication that the model is overestimating foreclosure rates
in the middle range of values. A plot of residuals against unemployment shows
heteroscedasticity.
13-49
Chapter 13 - Multiple Regression
13.27 The intercept would not have meaning. A priori logic for the relationship between each
predictor and the response variable are listed in the table below.
Predictor Relationship with
Response
Age Positive
Weight Positive
Height Neutral
Neck Positive
Chest Positive
Abdomen Positive
Hip Positive
Thigh Positive
Learning Objective: 12-2
13.28 50/8 = 6.25. The data set meets Doane’s Rule but not Evans’.
Learning Objective: 13-1
R² 0.841
Adjusted R² 0.810 n 50
R 0.917 k 8
Std. Error 3.957 Dep. Var. Fat%
ANOVA table
Source SS df MS F p-value
Regression 3,399.1446 8 424.8931 27.14 4.82E-14
Residual 641.8882 41 15.6558
Total 4,041.0328 49
13-50
Chapter 13 - Multiple Regression
13.30 Refer to the output in question 13.29. The 95% coefficient confidence intervals contain
zero except for the variables Weight, Abdomen, and Thigh. This means that these three
variables are the only significant predictors in the model.
Learning Objective: 13-4
13.31 For each coefficient test the following hypotheses: H0: βj = 0 vs. H1: βj ≠ 0. Using df = 41,
t.025 =T.INV(.025,41) =±2.020. Weight, Abdomen, and Thigh have t statistics equal to
2.462<2.02, 5.570> 2.02, and 2.678> 2.02, and are all significant predictors.
Learning Objective: 13-3
13.32 a. Weight: p-value = .0181, Abdomen:p-value = 1.77E-06, and Thigh Weight: p-value = .
0106.
b. This is consistent with the answer in 13.31.
c. The tests conclude the same thing. Most analysts prefer the p-value approach because
it tells the strength of the predictor significance.
Learning Objective: 13-3
13.33 Fcalc = 27.14 with a p-value = 4.82E-14. R2 = .841 and R2adj = .810. The model provides
significant fit with a fairly strong prediction of percent body fat.
Learning Objective: 13-2
13.34 Prediction interval: yˆi ± t.025se = yˆi ± 2.020(3.957) = yˆi ± 7.993. Yes, this model does have
practical value.
Learning Objective: 12-9
13-51
Chapter 13 - Multiple Regression
13.35 a.
50sample size
b. According to Klein’s Rule there are many pairs of data that are causing concern for
collinearity. The correlation coefficient values are greater than .841 = .917. .
Learning Objective: 13-6
13.36 a.
variables VIF
Intercept
Age 1.712
Weight 31.111
Height 1.689
Neck 5.472
Chest 11.275
Abdomen 17.714
Hip 25.899
Thigh 11.931
b. The VIF values are high except for Age, Height, and Neck. Multicollinearity is a
concern.
Learning Objective: 13-6
13.38 Observation 5, 15, 36, 39, 42 had high leverage values. To determine if any of the
observations were influential, remove them from the data set and rerun the regression.
If the regression statistics change significantly then the observation could be
considered influential.
Learning Objective: 13-8
13-52
Chapter 13 - Multiple Regression
13.39 Assuming normally distributed residuals appears reasonable although the histogram
appears slightly right skewed.
13-53
Chapter 13 - Multiple Regression
13-54
Chapter 13 - Multiple Regression
13.26 The variable magnitudes are different. The response variable magnitude is in tens of
thousands and the predictor variables are integers between 0 and 50.
Learning Objective: 13-9
13.27 The intercept would not have meaning. It would not be logical to have a car with zero
values for any of the predictors. A priori reasoning for the relationship between each
predictor and the response variable are listed in the table below.
Predictor Relationship with Reason?
Response
Age Negative Older car, lower price
Cars are less expensive than Vans which is
Car Negative the base indicator variable
Trucks are more expensive than Vans which is
Truck Positive the base indicator variable
SUVs are more expensive than Vans which is
SUV Positive the base indicator variable
Learning Objective: 12-2
13.28 637/4 = 159.25. The data set meets both Evans’ and Doane’s Rules.
Learning Objective: 13-1
13-55
Chapter 13 - Multiple Regression
MegaStat Output:
Regression Analysis
R² 0.139
Adjusted R² 0.134 n 637
R 0.373 k 4
Std. Error 8573.178 Dep. Var. Price
ANOVA table
Source SS df MS F p-value
Regression 7,512,691,865.5390 4 1,878,172,966.3848 25.55 1.20E-19
Residual 46,451,606,047.1737 632 73,499,376.6569
Total 53,964,297,912.7127 636
13.30 Refer to the output in question 13.29. The 95% coefficient confidence interval for the
indicator variable Car contains zero. The other three variables are significant
predictors in the model.
Learning Objective: 13-4
13.31 For each coefficient test the following hypotheses: H0: βj = 0 vs. H1: βj ≠ 0. Using df = 632,
t.025 =T.INV(.025,632) =±1.964. Age,Truck, and SUV have t statistic values equal to
5.897, 4.359, and 2.963, respectively.
Learning Objective: 13-3
13.32 a. Age: p-value = 6.02E-09, Truck: p-value = 1.52E-05, and SUV: p-value = .0032. All
three p-values are less than .05
b. This is consistent with the answer in 13.31.
c. The tests conclude the same thing. Most analysts prefer the p-value approach because
it tells the strength of the predictor significance.
Learning Objective: 13-3
13.33 Fcalc = 25.55 with a p-value = 1.20E-19. R2 = .139 and R2adj = .134. The model provides
significant fit but the model is not a strong predictive equation.
Learning Objective: 13-2
13-56
Chapter 13 - Multiple Regression
13.34 Prediction interval: yˆi ± t.025se = yˆi ± 1.964(8573.178) = yˆi ± 16837.722. No, this model does
not have practical value.
Learning Objective: 12-9
13.35 a.
Correlation Matrix
637sample size
b. There is no concern for collinearity. While there is significant correlation between the
indicator variables, this is to be expected because of the way they are defined. We
would expect to see correlation between indicator variables defined on the same
characteristic.
Learning Objective: 13-6
13.36 a.
variables VIF
Intercept
Age 1.017
Car 3.201
Truck 2.662
SUV 2.749
b. The VIF values are all 4 or less which suggests that multicollinearity has not caused
instability.
Learning Objective: 13-6
13.37 Vehicles 212, 342, and 502 were extremely high outlier residuals. Vehicles 246, 397, 631,
and 632 were unusual/outlier residuals. The next step would be to investigate the
three high outlier residuals for possible exclusion from the data set. It is possible their
prices were mistyped or the vehicles do not fit the profile of a vehicle for which the
model is being developed.
Learning Objective: 13-8
13.38 There were many observations with high leverage, too many to list here. To determine if
any of the observations were influential, remove them from the data set and rerun the
regression. If the regression statistics change significantly then the observation could
be considered influential.
13-57
Chapter 13 - Multiple Regression
13.39 Residuals do not appear to be normally distributed. Outliers should be investigated for
possible removal from data set.
13-58
Chapter 13 - Multiple Regression
13.40 Assuming homoscedastic residuals is not reasonable. There are obviously outliers in the
data set. Removing the outliers and rerunning the analysis will most likely show
heteroscedasticity in the residual plot.
13-59
Chapter 13 - Multiple Regression
13.27 The intercept would not have meaning. It would not be logical to have a freestyle time
when the age of the swimmer is zero. A priori reasoning for the relationship between
each predictor and the response variable are listed in the table below.
Predictor Relationship with Reason?
Response
The lowest seeded swimmers should have
Seed Positive the lowest times and vice versa
If Gender = 1 indicates female it is possible
Gender Positive the women have slower times than men
The older a swimmer is the slower their times
Age Positive will be
Learning Objective: 12-2
13.28 198/3 = 66. The data set meets both Evans’ and Doane’s Rules.
Learning Objective: 13-1
ANOVA
table
Source SS df MS F p-value
4,755,790.720
Regression 5 3 1,585,263.5735 1041.45 2.64E-119
Residual 295,300.5055 194 1,522.1676
5,051,091.226
Total 0 197
13-60
Chapter 13 - Multiple Regression
13.30 Refer to the output in question 13.29. None of the 95% coefficient confidence interval for
all three variables contain zero.All three variables are significant predictors in the
model.
Learning Objective: 13-4
13.31 For each coefficient test the following hypotheses: H0: βj = 0 vs. H1: βj ≠ 0. Using df = 194,
t.025 =T.INV(.025,194) = ±1.972. All variables have t statistic values greater than
1.972 therefore all predictors are significant.
Learning Objective: 13-3
13.32 a. Seed: p-value = 8.29E-90,Gender: p-value = .0440, and Age: p-value = .0463. All
three p-values are less than .05
b. This is consistent with the answer in 13.31.
c. The tests conclude the same thing. Most analysts prefer the p-value approach because
it tells the strength of the predictor significance.
Learning Objective: 13-3
13.33 Fcalc = 1041.45 with a p-value = 2.64E-119. R2 = .942 and R2adj = .941. The model provides
significant fit and is a strong predictor of finishing times.
Learning Objective: 13-2
13.34 Prediction interval: yˆi ± t.025se = yˆi ± 1.972(39.015) = yˆi ± 76.938. Yes, this model does have
have practical value.
Learning Objective: 12-9
13.35 a.
Correlation Matrix
b. There is no concern for collinearity. While there is significant correlation between the
indicator variables, the correlation is not high enough to cause concern.
Learning Objective: 13-6
13-61
Chapter 13 - Multiple Regression
13.36 a.
variables VIF
Seed 2.086
Gender 1.411
Age 1.870
b. The VIF values are all 3 or less which suggests that multicollinearity has not caused
instability.
Learning Objective: 13-6
13.37 Seven swimmers had residuals that were either unusual (greater than 2 but less than 3) or
outliers (greater than 3.) This means the model underestimated their finishing times.
Six swimmers had residuals that were either unusual (less than −2 but greater than
−3) or outliers (less than −3.) This means the model overestimated their finishing
times.
Learning Objective: 13-8
13.38 There were nine observations with high leverage. To determine if any of the observations
were influential, remove them from the data set and rerun the regression. If the
regression statistics change significantly then the observation could be considered
influential.
Learning Objective: 13-8
13.39 Residuals do not appear to be normally distributed. Outliers should be investigated for
possible removal from data set. The histogram shows outliers on both the low and
high end.
13-62
Chapter 13 - Multiple Regression
13.40 Assuming homoscedastic residuals is not reasonable. There is a clear fan out pattern. It
appears that as the seed time increases, the variation in residuals increases.
13-63
Chapter 13 - Multiple Regression
13-64
Chapter 13 - Multiple Regression
13.42 a. Each coefficient measures the additional revenue earned by selling one more unit
(one more car, truck, or SUV,respectively.)
b. The intercept is not meaningful. Ford has to sell at least one car, truck or SUVto earn
revenue. No sales means no revenue.
c. The error term might consist of factors such as the price of fuel, which heavily
influences vehicle sales, and the state of the economy. Sales are lower when the
economy is unpredictable because people hold onto their cars longer. In addition, the
predictor variables are highly correlated to each other (multicollinearity problem), as
well as related to “missing variables” that influence their sales as well as revenue.
Learning Objective: 12-2
Learning Objective: 13-6
13.43 There are no quantitative predictor variables. A better approach would be to use an ANOVA
procedure that compares means within groups. In addition, the sample size is too
small relative to the number of predictors. There would have to be 6 binary variables
to cover the suppliers and substrate categories. With only 11 observations this would
violate both Evans’ Rule and Doane’s Rule.
Learning Objective: 13-5
13-65
Chapter 13 - Multiple Regression
marginally better in terms of fit statistics than the one or two variable models. No
gain in fit is achieved by adding LifeExp and Density.
2. Examination of the individual regression coefficients indicates that the InfMort and
Literate have p-values < .01 and GDPCap has a p-value < 0.05.
3. Conclusion: Infant mortality and literacy rate have the greatest impact on birth
rates.
Learning Objective: 13-2
13.47 a. Yes, the coefficients make sense, except for TrnOvr. One would think that turnovers
would actually reduce the number of wins, not increase them.
b. No. It is negative and the number of games won is limited to zero or greater.
c. One needs either 5 observations or 10 observations per predictor. There are 6
predictor variables in the model, so we need a minimum of 30 observations to meet
Doane’s rule. The fact there were only 23 teams and therefore only 23 observations
means the sample size was probably too small to make the model reliable.
d. Rebounds and points are highly correlated. We don’t need both of them in the model.
This could be inflating in the variance of the predictor estimates, causing the
predictors to appear insignificant.
Learning Objective: 13-3
Learning Objective: 13-6
13.49 a. Both men and women who had prior marathon experience had lower times on
average than those who were running for the first time.
b. No the intercept does not have any meaning. If all predictor/binary variables were 0
then you wouldn’t have an individual racer.
c. It is suspected non-linearity is present among age, weight, and height. In this model
we see increases in age decreases times, but at an increasing rate, increases in weight
decreases time, but at an increasing rate and increasing height increases time, but at a
decreasing rate.
d. The model predicted that I would run the marathon in about 12 and ½ hours. And
that could be right. I can walk 4 mph so it would take at least 6 to 7 hours minimum!
Learning Objective: 13-3
13-66
Chapter 13 - Multiple Regression
13.50 The three predictors of CityMPG are most likely strongly correlated with each other. The
VIF values do not show any concern (all less than 3) but we see that the variables
Length and Width are not significant predictors of gas mileage. The variable Weight is
a significant predictor (p-value = .0000)and according to the R2 value of .682,
explains approximately 68% of the variation in CityMPG.
Learning Objective: 13-2
Learning Objective: 13-3
13.51 While the four predictor model gives the highest R2 (.474) and lowest standard error
(143.362), the predictor Divorce is not significant. The decrease in R2(.454) after
removing Divorce is quite small so the three predictor model would be the best
choice.
Learning Objective: 13-2
Learning Objective: 13-3
13-67