Chap 013

Chapter 13 - Multiple Regression
Chapter 13
Multiple Regression
13.1 a. ŷ = 4.306 − 0.082ShipCost + 2.265PrintAds + 2.498WebAds + 16.697Rebate%
b. The coefficient of ShipCost says that each additional $1 of shipping cost reduces
about$ 0.082 from net revenue.
The coefficient of PrintAds says that each additional $1000 of printed ads adds about
$2,265 to net revenue.
The coefficient of WebAds says that each additional $1000 of printed ads adds about
$2,498 to net revenue.
The coefficient of Rebate% says that each additional percentage in the rebate rate
adds about $16,700 to net revenue.
c. The intercept is meaningless. You have to supply some product, so shipping cost
can’t be zero. You don’t have to have a rebate or ads, they can be zero.
d. NetRevenue = 4.306 − 0.082(10)+ 2.265(50) + 2.498(40) + 16.697(15) = 467
thousands or $467,111.
Learning Objective: 13-1
13.2 a. ŷ = 1225.44 + 11.52FloorSpace−6.935CompetingAds−0.1496Price

b. The coefficient of FloorSpace says that each additional square foot of floor space
adds about $11,520to average sales.
The coefficient of CompetingAds says that each additional $1000 of
CompetingAdsreducessales by $6,935.
The coefficient of Price says that each additional $1 of Advertised Pricereduces sales
by$149.60.
c. No. If all of these variables are zero, you wouldn’t sell a bike (no one will advertise a
bike for zero).
d. Sales = 1225.44 +11.52(80) − 6.935(100) − 0.1496(1200) = 1274.02 thousands or
$1,274,020.
13.3 a. ŷ = 2.8931 + 0.1542LiftWait + 0.2495AmountGroomed + 0.0539SkiPatrolVisibility

0.1196FriendlinessHosts
b. Overall satisfaction increases with an increase in satisfaction for each individual
predictor except for friendliness of hosts. This counterintuitive result could be due to
an interaction effect. Interaction effects will be explored later in the chapter.
c. No. Satisfaction scores of zero for the individual predictors is out of the range of the
variable values. It is unwise to extrapolate.
d. Overall satisfaction score = 2.8931 + 0.1542(5) + 0.2495(5) + 0.0539(5)  0.1196(5)
= 4.5831
13-1
13.4 a. ŷ = 4198.5808 27.3540AgeMed + 17.4893Bankrupt 0.0124FedSpend

29.0314HSGrad%
b. The 2005 state by state crime rate per 100,000 decreases by about 27 as the state
median age increases, increases by about 17 for every 1000 new bankruptcies filed,
decreases by .0124 for each dollar increase in federal funding per person, and
decreases by about 29 for each 1% increase in high school graduations.
c. No, a state would not have 0 median age or 0 values for any of the other predictor
variables.
d. Burglary Rate = 4198.5808 27.3540(35) + 17.4893(7)0.0124(6000)29.0314(80) =
966.7039
13.5 a. df1 = 4 and df2 = 45

b. From Appendix F: F.05 = 2.61, using df1 = 4 and df2 = 40. From Excel with df1 = 4 and
df2 = 45: F.05=F.INV.RT(.05,4,45) = 2.5787.
c. Fcalc = 64,853/4990 = 12.997. Yes, the overall regression is significant.
H0: All the coefficients are zero (1 = 2 = 3 = 0)
H1: At least one coefficient is not zero
d. R2 = 259,412/483,951 = .536 R2adj = 1 –[224539/45)/(483951/49)] = .4948
13.6 a. df1 = 3 and df2 = 26

b. From Appendix F:F.05 = 2.98. From Excel: =F.INV.RT(.05,3,26) = 2.975.
c. Fcalc = 398802/14590 = 27.334. Yes, the overall regression is significant.
d. R2 = 1196410/1575741 = .7593R2adj = 1 –[(379332/26)/(1575742/29)]= .7315
13.7 a. df1 = 4 and df2 = 497

b. From Appendix F: F.05 = 2.42using df1 = 4 and df2 = 200. From Excel with df1 = 4 and
df2 = 497: F.05=F.INV.RT(.05,4,497) = 2.39.
c. Fcalc = 8.2682/0.6398 = 12.9231. Yes, the overall regression is significant.
d. R2 = 33.0730/351.0598 = .0942R2adj = 1 –[(317.0598/497)/(351.0598/501)]= .0896
13.8 a. df1 = 4 and df2 = 45

b. From Appendix F: F.05 = 2.61using df1 = 4 and df2 = 40. From Excel with df1 = 4 and
df2 = 45: F.05=F.INV.RT(.05,4,45) = 2.5787.
c. Fcalc = 295683.3212/35221.1519 = 8.3951. Yes, the overall regression is significant.
d. R2 = 1182733.285/2767685.12 = .4273
R2adj = 1 –[(1584951.835/45)/(2767685.12/49)]= .3764
13-2
13.9 a.
bj  0
tcalc , p-value =T.DIST.2T(|tcalc|, 45)
sj
Predictor Coef, bj sj tcalc p-value
Intercept 0.0608585
0.9517414
ShipCost -0.0175289
0.9860922
PrintAds 2.1571429
0.0363725
2.9537661
0.0049772
Rebate% 4.6770308
b. t.005 =T.INV(.005,45) =±2.69. Web Ads and Rebate% differ significantly from zero (p-
value < .01 and tcalc >2.69.)
c. See table in part a.
13.10 a.
bj  0
sj
PredictorCoef, bj sj tcalc p-value

Intercept 3.0843192
FloorSpace 8.6631579
CmpetingAds -1.7759283
Price-0.14955 -1.6752548
b. t.005=T.INV(.005,26) =±2.779.Only Floor Space differs significantly from zero (p-

value < .01 and tcalc >2.779).
13.11 a.
bj  0
sj

Intercept 2.8931 0.3680 7.8617 2.37E-14
LiftWait 0.1542 0.0440 3.5045 .0005
AmountGroomed 0.2495 0.0529 4.7164 3.07E-06
SkiPatrolVisibility 0.0539 0.0443 1.2167 .2245
FriendlinessHost 0.1196 0.0623 1.9197 .0557
13-3
b. t.005 =T.INV(.005,497) =±2.69. Coefficients on LiftWait and AmountGroomed differ

significantly from zero.
13.12 a.
bj  0
sj

Intercept 4,198.5808 799.3395 5.2526 3.95E-06
AgeMed 27.3540 12.5687 2.1764 0.0348
Bankrupt 17.4893 12.4033 1.4101 0.1654
FedSpend 0.0124 0.0176 0.7037 0.4853
HSGrad% 29.0314 7.1268 4.0736 0.0002
b. t.005 =T.INV(.005,45) =± 2.586. Only the coefficient on HSGrad% differs significantly

from zero at a significance level of .01.
13.13 Use ŷi ± t/2sewith 34 df, t.025 =T.INV(.025,34) = 2.032. (Use the positive value.)
Half width of 95% prediction interval = 2.032(3620) =7355.84
Using the quick rule the half width =2se = 2(3620) = 7240
Yes, the quick rule gives similar results.
13.14 Use ŷi ± t/2se with 20df, t.025 =T.INV(.025,34) = 2.086. (Use the positive value.)
Half width of 95% prediction interval =2.086 (1.17) =2.4406
Using the quick rule the half width = 2se = 2(1.17) = 2.34
Yes, the quick rule gives similar results.
13.15 a. Number of nights (NumNight) needed and number of bedrooms (NumBedrooms) are
both discrete.
b. Two: SwimPool = 1 if there is a swimming pool and ParkGarage = 1 if there is a
parking garage
c. CondoPrice = 0 + 1NumNights + 2NumBedrooms + 3SwimPool + 4ParkGarage
13.16 a. Weight of stone (continuous)

b. Nine: There are 6 different values for Colorrating so the model would need 5 (6-1)
indicator variables for color rating. There are 5 different Clarity rating values so the
model would need 4 (5-1) indicator variables for clarity rating.
13-4
c.Price = 0 + 1Weight + 2ColorD + 3ColorE +4ColorF +5ColorG +6ColorH

+7ClarityIF +8ClarityVVS1+9ClarityVVS2+10ClarityVS1
13.17 a. ln(Price) = 5.4841  0.0733SalePrice + 1.1196Sub-Zero + 0.0696Capacity +

0.04662DoorFzBot 0.34322DoorFzTop 0.70961DoorFz 0.88201DoorNoFz
b. Use p-value =T.DIST.2T(|tcalc|, 319) SalePrice:p-value = .0019, Sub-Zero: p-value =
2.24E-14, Capacity: p-value = 2.71E-31, 2DoorFzBot: p-value = .5650, 2DoorFzTop
= p-value = 3.68E-19, 1DoorFz: p-value = 1.19E-07, 1DoorNoFz: p-value = 8.59E-
09
The only variable that is not a significant predictor is 2DoorFzBot. This is an
indicator variable which means that there is not a significant difference in price
between two door refrigerators that have the freezer compartment on the side or on
the bottom.
c. The coefficient on two door, top freezer is −0.3432 so the natural log of the price
decreases by 0.3432.
d. The side freezer model demands a higher price because there is a negative coefficient
on the 1DoorFz model indicator variable.
13.18 a. SentenceLength = 3.2563 + 0.5219Age + 7.7412Convictions

6.0852Married14.3402Employed
b. Use p-value =T.DIST.2T(|tcalc|, 45). Age:p-value = 9.54E-06, Convictions: p-value =
2.03E-09, Married: p-value = .0228, Employed: p-value = 1.00E-06
c. A married male convicted of assault will receive a sentence that is about 6 years
shorter than an unmarried male assault convict.
d. About 14 years.
e. SentenceLength = 3.2563+ 0.5219(25) + 7.7412(1)  6.0852(0) 14.3402(0) =
24.045.
13-5
13.19 a. The scatter plot shows an obvious increasing trend but it is nonlinear rather than
linear. The increase in salary is much steeper in the earlier years than in the later
years. A nonlinear model would be appropriate.
b. MegaStat Output is below: R2 = .915, Fcalc =194.99, p-value = .0000. Yes, the model is
significant.
Regression Analysis
R² 0.915
Adjusted R² 0.911 n 39
R 0.957 k 2
Std. Error 8.757 Dep. Var. Salary ($1000)
ANOVA
table
Source SS df MS F p-value
29,901.972
Regression 8 2 14,950.9864 194.99 4.84E-20
Residual 2,760.3861 36 76.6774
32,662.359
Total 0 38
Regression output confidence interval

std. 95% 95%
variables coefficients error t (df=36) p-value lower upper
Intercept 45.3322 3.0644 14.793 7.13E-17 39.1172 51.5472
Years 5.6218 0.4742 11.856 5.45E-14 4.6601 6.5835
YearsSq -0.0945 0.0139 -6.789 6.21E-08 -0.1227 -0.0663
13-6
c. Years: p-value = .0000, YearsSq: p-value = .0000. Both of these predictors are
significant.
13.20 MegaStat Output is below: Male: p-value = .5009, YearsxMale: p-value = .0505. The
binary variable Male is not significant. The interaction variable YearsxMale is
significant because the p-value is less than .10. The coefficient on the interaction term
is positive which means that as men gain more years of experience their salaries tend
to be higher than females.
Regression Analysis
R² 0.945
R 0.972 k 4
Std. Error 7.269 Dep. Var. Salary ($1000)
ANOVA
table
30,865.754
Regression 2 4 7,716.4386 146.03 6.59E-21
Residual 1,796.6048 34 52.8413
32,662.359
Total 0 38

std. 95% 95%
variables coefficients error t (df=34) p-value lower upper
Intercept 44.6252 3.7277 11.971 9.62E-14 37.0495 52.2008
Years 4.7166 0.5098 9.251 8.23E-11 3.6805 5.7527
YearsSq -0.1033 0.0129 -8.019 2.40E-09 -0.1295 -0.0771
Male 3.2301 4.7476 0.680 .5009 -6.4182 12.8783
YearsxMale 1.0938 0.5395 2.027 .0505 -0.0026 2.1902

13-7
13.21 a. All but one pair of variables is significantly correlated at α = .01. See the matrix
below. LiftOps and Scanners (r = .635), Crowds and LiftWait (r = .577), AmountGr
and TrailGr (r = .531), SkiSafe and SpSeen (r = .488)
Correlation Matrix
scanners liftops liftwait trailv snosurf crowds amountgr trailgr skisafe spseen hosts
scanners 1.000
liftops .635 1.000
liftwait .146 .180 1.000
1.00
trailv .115 .206 .128 0
snosurf .190 .242 .227 .373 1.000
crowds .245 .299 .577 .235 .348 1.000
amountgr .245 .271 .251 .221 .299 .372 1.000
1.00
trailgr .266 .337 .205 .360 .358 .362 .531 0
skisafe .200 .306 .196 .172 .200 .332 .274 .323 1.000
spseen .145 .190 .207 .172 .184 .230 .149 .172 .488 1.000
1.00
hosts .245 .278 .046 .140 .119 .133 .128 .156 .212 .350 0
502sample size
± .088 critical value .05 (two-tail)

b. All VIFs are less than 2. No cause for concern.

variables VIF
Intercept
scanners 1.718
liftops 1.887
liftwait 1.531
trailv 1.270
snosurf 1.337
crowds 1.838
amountgr 1.504
trailgr 1.676
skisafe 1.518
spseen 1.477
hosts 1.224
13-8
13.22 a. Al and Si (r = .456), Cr and Zn (r = .529) are significantly correlated at an  = .01. Al

and Ti (r = .389), Si and Zn (r = .365), and Ti and Pb (r = .345) are significantly
correlated at an  = .05.
Correlation Matrix
Al Si Cr Ti Zn Pb
Al 1.000
Si .456 1.000
Cr .133 -.073 1.000
Ti .389 .278 .011 1.000
Zn .286 .365 .529 -.083 1.000
Pb -.202 -.053 -.114 -.345 .180 1.000
33sample size

b. All VIFs are low in value. No cause for concern.
variables VIF
Intercept
Al 1.500
Si 1.674
Cr 1.761
Ti 1.390
Zn 2.197
Pb 1.281
2(k + 1)
13.23 If hi> then the observation is considered to be a high leverage observation.
n
2(5 + 1)
a. = .1667 . hi = .15 < .1667 therefore this is not a high leverage observation.
72
2(4 + 1)
b. = .10 . hi = .18 > .10 therefore this is a high leverage observation.
100
2(7 + 1)
c. = .0667 . hi = .08 > .0667 therefore this is a high leverage observation.
240
Learning Objective:13-8
13.24 Assumption #1: Residuals are normally distributed. It appears this assumption has been
violated because the histogram shows a skewed left distribution. Because the data set
is fairly small the normplot is not as useful for detecting non-normality.
13-9
Assumption #2: Residuals have constant variance. It appears this assumption has been
violated. The residuals plotted against the predicted Y values show a fan out pattern
which indicates heteroscedasticity or non-constant variance.
Assumption #3: Residuals are independent. It appears this assumption has been violated.
Applying the runs test (see section 12.8) to the Runs plot we see there are 10 crossing
points. If autocorrelation did not exist we would expect approximately 15/2 or 7 to 8
crossings. We have more than 8 crossings so negative autocorrelation is a concern.
Questions 13.25 through 13.41 refer to 10 different data sets labeled A-J. The answers to each question
are listed for each data set in turn.
DATA SET AResponse Variable: Vehicle City Mileage
13.25 Cross-sectional. Unit of observation: vehicle model.

13.26 The variable magnitudes are not too different. Weight is approximately 100 times the
magnitude of the other variables but this should not cause problems in the analysis.
13.27 The intercept would not have meaning. It would not be logical to have a car with zero
values for any of the predictors. A priori reasoning for the relationship between each
predictor and the response variable are listed in the table below.
Predictor Relationship with Reason?
Response
Length Negative Bigger size, lower mileage
Width Negative Bigger size, lower mileage
Weight Negative Bigger size, lower mileage
Japanese cars have a reputation for better
Japan Positive mileage
13.28 43/4 = 10.75 > 10. The data set meets both Evans’ and Doane’s Rules.
13.29 The estimated regression equation is ŷ = 43.9932 − 0.0039length − 0.1064width −

0.0041weight − 1.3228Japan. The signs on the coefficients match our a priori
reasoning except for Japanese vehicles.
13-10
MegaStat Output:
Regression Analysis
R² 0.703
R 0.838 k 4
Std. Error 2.505 Dep. Var. City
ANOVA table
Regression 563.9264 4 140.9816 22.46 1.40E-09
Residual 238.5387 38 6.2773
Total 802.4651 42
variables coefficients std. error t (df=38) p-value 95% lower 95% upper
Intercept 43.9932 8.4767 5.190 7.33E-06 26.8330 61.1534
Length -0.0039 0.0445 -0.087 .9311 -0.0939 0.0862
Width -0.1064 0.1395 -0.763 .4501 -0.3888 0.1759
Weight -0.0041 0.0008 -4.955 1.53E-05 -0.0058 -0.0024
Japan -1.3228 0.8146 -1.624 .1127 -2.9718 0.3262
13.30 Refer to the output in the question 13.29. The coefficient confidence intervals contain zero
except for the variable Weight. This means that Weight is the only significant
predictor in the model (at a significance level of .05.)
13.31 For each coefficient test the following hypotheses: H0: βj = 0 vs. H1: βj ≠ 0. Using df = 38,
t.025 =T.INV(.025, 38) = ±2.024. Only Weight has a significant result with tcalc =
4.955< −2.024
13.32 a. Weight: p-value = 1.53E-05 < .05.

b. This is consistent with the answer in 13.31.
c. The tests conclude the same thing. Most analysts prefer the p-value approach because
it tells the strength of the predictor significance.
13.33 Fcalc = 22.46 with a p-value = 1.40E-09. R2 = .703 and R2adj = .671. The model provides
significant fit with a fairly strong prediction of city mileage.
13.34 Prediction interval: yî ± t.025se = yî ± 2.024(2.505) = yî ± 5.07. Yes, this model does have
practical value.
13-11
13.35 a.
Correlation Matrix
Length Width Weight Japan

Length 1.000
Width .720 1.000
Weight .753 .739 1.000
Japan -.160 -.267 -.093 1.000
43sample size

b. Both length and width are significantly correlated with Weight. Collinearity could be a
problembut according to Klein’s Rule we shouldn’t be overly concerned. (Both .72
and .753 are less than .703 = .838 .)
13.36 a.
variables VIF
Intercept
Length 2.672
Width 2.746
Weight 2.907
Japan 1.106
b. The VIF values are all under 3 which suggest that multicollinearity has not caused
instability. In fact, both length and width turned out to be insignificant in the model
which is what one would expect.
13.37 Vehicle 42, the Jetta, was the only observation that had an outlier residual.
13.38 Observation 2, 8, 13, and 21 had high leverage values. To determine if any of the
observations were influential, remove them from the data set and rerun the regression.
If the regression statistics change significantly then the observation could be
considered influential.
13-12
13.39 Assuming normally distributed residuals appears reasonable with a high outlier. Although
running a normality test of the residuals (A-D and moment tests) results in a failure;
more than likely because of the outlier.
13-13
13.40 Assuming homoscedastic residuals appears reasonable. We see the one high outlier on the
plot.
13.41 This is cross-sectional data. A test for autocorrelation is not warranted.

13-14
DATA SETB:Response Variable: Noodles& Company Sales/SqFt
13.25 Cross-sectional. Unit of observation: restaurant.

13.26 The variable magnitudes are all similar.

13.27 The intercept would not have meaning. It would not be logical to have a restaurant with
zero values for any of the predictors. A priori reasoning for the relationship between
each predictor and the response variable are listed in the table below.
Response
Seats- The larger the size of the restaurant, the
Inside greater the sales.
Positive
Seats- The larger the size of the restaurant, the
Patio greater the sales.
Positive
MedInco The higher the income of the potential
me customers, the higher the sales
Positive
MedAge The older the potential
Positive customers, the higher the sales
BachDeg More education would be positively
% correlated with higher income and therefore
Positive higher the sales
13.28 74/5 = 14.8 > 10. The data set meets both Evans’ and Doane’s Rules.
13.29 The estimated regression equation is ŷ = 429.5114 − 1.8149Seats-Inside + 1.2719Seats-

Patio − 2.1021MedIncome − 0.0158MedAge + 8.6604BachDeg%
The signs on the coefficients do not match our a priori reasoning for Seats-Inside,
MedIncome, and MedAge.
MegaStat Output:
Regression Analysis
R² 0.233
R 0.483 k 5
Std. Error 124.529 Dep. Var. Sales/SqFt
ANOVA table
Regression 320,276.8169 5 64,055.3634 4.13 .0025
Residual 1,054,515.7777 68 15,507.5850
Total 1,374,792.5946 73
13-15
13-16

Intercept 429.5114 182.1907 2.357 .0213 65.9556 793.0672
Seats-Inside -1.8149 0.9975 -1.819 .0733 -3.8054 0.1757
Seats-Patio 1.2719 1.0614 1.198 .2350 -0.8462 3.3900
MedIncome -2.1021 1.0941 -1.921 .0589 -4.2853 0.0811
MedAge -0.0158 4.4891 -0.004 .9972 -8.9737 8.9420
BachDeg% 8.6604 2.6187 3.307 .0015 3.4348 13.8860

13.30 Refer to the output in question 13.29. The 95% coefficient confidence intervals for the
variables Seats-Inside, Seats-Patio, MedIncome, and MedAge all contain zero. Only
BachDeg% is a significant predictor in the model at a significance level of .05.
t.025 =T.INV(.025,68) =±1.995. Only BachDeg% has a significant result at  = .05
with tcalc = 3.307> 1.995.
13.32 a. BachDeg%: p-value = .0015 < .05. Note that both MedIncome and Seats-Inside are
both significant at  = .10.
13.33 Fcalc = 4.13 with a p-value = .0025. R2 = .233 and R2adj = .177. The model provides
significant fit but does not provide strong prediction of restaurant sales/sqft.
13.34 Prediction interval: yî ± t.025se = yî ± 1.995(124.529) = yî ± 248.4354. No, this model does
not have practical value because the width of the interval is so wide.
13-17
13.35 a.
Seats-Inside Seats-Patio MedIncome MedAge BachDeg%

Seats-Inside 1.000
Seats-Patio .007 1.000
MedIncome -.047 -.009 1.000
MedAge -.102 -.065 .416 1.000
BachDeg% -.158 .151 .552 .097 1.000
74sample size

b. Both MedAge and BachDeg% are significantly correlated with MedIncome.

Collinearity could be a problemaccording to Klein’s Rule because .552 >
.233 = .483 .
13.36 a.
variables VIF
Intercept
Seats-Inside 1.045
Seats-Patio 1.039
MedIncome 1.807
MedAge 1.267
BachDeg% 1.584
b. The VIF values are all under 2 which suggests that multicollinearity has not caused
instability.
13.37 Observations 6, 19, 22, 46, and 69 would be considered unusual or outlier residuals.
13.38 Observation 14, 19, 23, and 69 had high leverage values. To determine if any of the
13-18
13.39 Assuming normally distributed residuals appears reasonable.
13-19
13.40 Assuming homoscedastic residuals appears reasonable. There is no obvious fan out or
funnel pattern in the plot below.

13-20
DATA SET C Response Variable: Medical Office Building Assessed Value
13.25 Cross-sectional. Unit of observation: office building.

13.26 The variable magnitudes are not too different. Floor space is approximately 1000 times the
magnitude of the other predictor variables but is similar to the response variable.
13.27 The intercept would not have meaning. It would not be logical to have a building with zero
Response
Floor Positive Bigger size, higher value
Offices Positive More offices, higher value
Entrances Positive More entrances, higher value
Increase in age means more maintenance,
Age Negative lower value
Freeway Positive Closer access to freeway, higher value
13.28 32/5 = 6.5. The data set meets Doane’s Rule but not Evans’.
13.29 The estimated regression equation is ŷ = −59.3894 + 0.2509Floor + 97.7927Offices +

72.8405Entrances − 0.4570Age + 116.1786Freeway. The signs on the coefficients
match our a priori reasoning.
MegaStat Output:
Regression Analysis
R² 0.967
R 0.983 k 5
Std. Error 90.189 Dep. Var. Assessed
ANOVA table
Regression 6,225,261.2561 5 1,245,052.2512 153.07 2.01E-18
Residual 211,486.6189 26 8,134.1007
Total 6,436,747.8750 31

Intercept -59.3894 71.9826 -0.825 .4168 -207.3519 88.5730
Floor 0.2509 0.0218 11.494 1.08E-11 0.2060 0.2957
13-21
Offices 97.7927 30.8056 3.175 .0038 34.4709 161.1145

Entrances 72.8405 38.7501 1.880 .0714 -6.8114 152.4924
Age -0.4570 1.2011 -0.380 .7067 -2.9258 2.0118
Freeway 116.1786 34.7721 3.341 .0025 44.7036 187.6535
13.30 Refer to the output in question 13.29. The coefficient confidence intervals contain zero for
the variables Entrances and Age. This means that Floor, Offices, and Freeway are
significant predictors in the model.
t.025 =T.INV(.025, 26) =±2.056. Floor, Offices, and Freeway both have significant
results with tcalc = 11.494, 3.175, and 3.341, respectively (both are greater than 2.056.)
13.32 a. Floor:p-value = 1.08E-11, Offices: p-value = .0038, and FreewayWeight: p-value = .

0025. All p-values are less than .05
significant fit with a very strong prediction of building assessed value.
13.34 Prediction interval: yî ± t.025se = yî ± 2.056(90.189) = yî ± 185.429. Yes, this model does
have practical value. The prediction interval width should provide valuable
information.
13-22
13.35 a.
Floor Offices Entrances Age Freeway

Floor 1.000
Offices .823 1.000
Entrances .567 .444 1.000
Age -.189 -.241 .136 1.000
Freeway -.331 -.368 -.082 .175 1.000
32sample size

b. Both Offices and Entrances are significantly correlated with Floor. Collinearity is
most likely not a problem according to Klein’s Rule. The correlation coefficient
values are less than .967.
13.36 a.
variables VIF
Intercept
Floor 3.757
Offices 3.267
Entrances 1.638
Age 1.169
Freeway 1.185
instability.
13.37 Building 5 was the only observation that had an unusual residual.
13.38 There were no high leverage values.

13.39 The histogram shows a slight left skewed distribution but is unimodal with no obvious
outliers. The normplot is a fairly straight line on the diagonal. Assuming normally
distributed residuals appears reasonable.
13-23
13-24
13.40 Assuming homoscedastic residuals appears reasonable. The residual plot does not show a
fan out or funnel pattern.

13-25
DATA SET D Response Variable: Percent Change in Consumer Price Index
13.25 Time-series. Unit of observation: one year.

13.26 The variable magnitudes are similar.

13.27 The intercept would not have meaning. While it might be possible to have 0% change in
currency demands and deposits, it would not be logical to have a zero unemployment
rate or zero utilization of manufacturing capacity. A priori reasoning for the
relationship between each predictor and the response variable are listed in the table
below.
Response
CapUtil Positive Greater utilization, increase in CPI
ChgM1 Negative Increase in deposits, CPI stays stable
ChgM2 Positive Increase in small deposits, CPI increases
Unem Positive Unemployment increases, CPI increases
13.28 41/4 = 10.25. The data set meets both Evans’ and Doane’s Rules.
13.29 The estimated regression equation is ŷ = −25.2195 + 0.2806CapUtil − 0.0847ChgM1 +

0.2205ChgM2 + 1.0511Unem. The signs on the coefficients match our a priori
reasoning.
MegaStat Output:
Regression Analysis
R² 0.225
R 0.474 k 4
Std. Error 2.623 Dep. Var. ChgCPI
ANOVA table
Regression 71.8691 4 17.9673 2.61 .0514
Residual 247.5957 36 6.8777
Total 319.4649 40
13-26

Intercept -25.2195 11.7919 -2.139 .0393 -49.1346 -1.3044
CapUtil 0.2806 0.1258 2.231 .0320 0.0255 0.5357
ChgM1 -0.0847 0.1117 -0.758 .4531 -0.3112 0.1418
ChgM2 0.2205 0.1383 1.594 .1197 -0.0601 0.5011
Unem 1.0511 0.4086 2.572 .0144 0.2224 1.8798
13.30 Refer to the output in question 13.29. The 95% coefficient confidence intervals for ChgM1
and ChgM2 both contain zero. This means that neither of these predictors is
significant at  = .05. The confidence intervals for CapUtil and Unem do not contain
zero so both predictors are significant at a .05 level.
t.025 =T.INV(.025, 36) =±2.028. CapUtil has a tcalc = 2.231 and Unemp has a tcalc =
2.572, which are both greater than 2.028 so this indicates significance.
13.32 a. CapUtil: p-value = .0320 and Unempp-value = .0144. Both are significant at  = .05.
13.33 Fcalc = 2.61 with a p-value = .0514. R2 = .225 and R2adj = .139. The model does not provide
significant fit at  = .05.
13.34 Prediction interval: yî ± t.025se = yî ± 2.028(2.623) = yî ± 5.319. Based on the response to
question 13.33 and the wide prediction interval,no, this model does not have practical
value.
13-27
13.35 a.
CapUtil ChgM1 ChgM2 Unem

CapUtil 1.000
ChgM1 -.257 1.000
ChgM2 -.284 .316 1.000
Unem -.649 .504 .303 1.000
41sample size

b. Unemployment is significantly correlated with manufacturing capacity utilization and

the variable CHGM1. Collinearity could be a problemaccording to Klein’s Rule. Both
.649 and .504 are greater than .225 = .474.
13.36 a.
variables VIF
Intercept
CapUtil 1.785
ChgM1 1.420
ChgM2 1.171
Unem 2.192
instability.
13.37 1974, 1979, and 1980 were years that had unusual or outlier residuals.
13.38 1992 and 2001 were high leverage years. To determine if any of the observations were
influential, remove them from the data set and rerun the regression. If the regression
statistics change significantly then the observation could be considered influential.
13-28
13.39 The normplot is not as straight of a line as one would like to see. The histogram of
residuals is right skewed with several possible outliers. Assuming normally
distributed residuals is questionable.
13-29
13.40 Assuming homoscedastic residuals appears reasonable although one might question the
slight increase in residual magnitude for the predictions of greater positive change.
13.41 A test for autocorrelation is warranted. DW = 0.75 which suggests positive autocorrelation.
13-30
DATA SET E Response Variable: College Graduation Rate by State
13.25 Cross-sectional. Unit of observation: state.

13.26 The variable magnitudes are not too different. Education spending by state is
approximately 100 times the magnitude of the other variables but this should not
cause a problem.
13.27 The intercept would not have meaning. None of the quantitative predictor variables would
logically be zero. A priori reasoning for the relationship between each predictor and
the response variable are listed in the table below.
Response
Higher dropout rate, lower college graduation
Dropout Negative rate
Higher spending, higher college graduation
EdSpend Positive rate
Greater urban population, higher college
Metro% Positive graduation rate
Age Negative Older population, fewer attending college
More women in workforce, more college
LPRFem Positive graduates
With Midwest as the base, more college
Neast Positive graduates in the northeast
Seast Positive graduates in the southeast
West Positive graduates in the west
13.28 50/8 = 6.25> 5. The data set meets Doane’s Rule but does not meet Evans’.
13.29 The estimated regression equation is ŷ = −21.7629− 0.2579Dropout + 0.0025EdSpend +

0.2036Metro% − 0.1458Age + 0.5477LPRFem + 5.26Neast + 2.0008Seast +
2.6378West. The signs on the coefficients match our a priori reasoning.
MegaStat Output:
Regression Analysis
R² 0.692
Adjusted
R² 0.632 n 50
R 0.832 k 8
Std. Error 3.099 Dep. Var. ColGrad%
13-31
ANOVA
table
Regression 885.4526 8 110.6816 11.53 2.16E-08
Residual 393.7026 41 9.6025
1,279.155
Total 2 49

coefficient t 95%
variables s std. error (df=41) p-value 95% lower upper
Intercept -21.7629 21.9424 -0.992 .3271 -66.0766 22.5507
Dropout -0.2579 0.1846 -1.398 .1697 -0.6307 0.1148
EdSpend 0.0025 0.0018 1.396 .1703 -0.0011 0.0062
Metro% 0.2036 0.0562 3.621 .0008 0.0900 0.3171
AgeMed -0.1458 0.2730 -0.534 .5963 -0.6971 0.4056
LPRFem 0.5477 0.1880 2.913 .0058 0.1680 0.9274
Neast 5.2600 1.5626 3.366 .0017 2.1042 8.4157
Seast 2.0008 1.7252 1.160 .2529 -1.4834 5.4849
West 2.6378 1.3417 1.966 .0561 -0.0718 5.3474

13.30 Refer to the output in question 13.29. The coefficient confidence intervals contain zero
except for the variables Metro%, LPRFem, and Neast. These three variables are the
only predictors significant at  = .05.
t.025 =T.INV(.025,41) = ±2.020. The t statistics for Metro%, LPRFem, and Neast are
3.621, 2.913, and 3.366, respectively. Each value is greater than 2.02.
13.32 a. Metro%: p-value =.0008, LPRFem: p-value = .0058, and Neast: p-value = .0017.
significant fit with a fairly strong prediction of state college graduation rates.
practical value.
13-32
13.35 a.
Dropou EdSpen Metro AgeMe LPRFe

t d % d m
Dropout 1.000
EdSpen
d -.323 1.000
Metro% .156 .183 1.000
AgeMed -.101 .119 -.274 1.000
LPRFem -.741 .228 -.291 -.002 1.000
50 sample size

b. The only two variables that have an r value that might raise a red flag are LPRFem
and Dropout. However this should not be a problemaccording to Klein’s Rule because
.741< .692 = .832.
13.36 a.
variables VIF
Intercept
Dropout 2.637
EdSpend 1.447
Metro% 1.684
AgeMed 1.886
LPRFem 3.174
Neast 2.182
Seast 2.827
West 1.890
b. The VIF values are all 3 or less (except for LPRFEM which is slightly greater than 3)
which suggests that multicollinearity has not caused instability.
13.37 Delaware and Wyoming are the only two states that showed unusual or outlier residuals.
13.38 Utah and West Virginia had high leverage values. To determine if any of the observations
were influential, remove them from the data set and rerun the regression. If the
regression statistics change significantly then the observation could be considered
influential.
13-33
13.39 Although the normplot is not as straight of a line as one would like and the histogram is
slightly skewed to the left, the distribution is unimodel with tapering tails and there
are no obvious outliers. Assuming normally distributed residuals appears reasonable.
13-34
13.40 Assuming homoscedastic residuals is not reasonable. There is a clear fan out pattern. It
appears that the residual variation increases as the percentage of a state’s population
living in metropolitan areas increases. Taking a log transform of the dependent
variable would not be a good solution here because the magnitudes of the variables
are similar. There may be lurking variables we could add to the model to correct this
problem.

13-35
DATA SET F Response Variable: Cruise Speed of Piston Aircraft
13.25 Cross-sectional. Unit of observation: aircraft model.


13.27 The intercept would not have meaning. It would not be logical to have an aircraft with zero
predictor and the response variable are listed in the table below. Note that the
variable Age is calculated by subtracting the year of manufacture from 2010.
Response
Age Negative Older engine, lower speed
TotalHP Positive Bigger engine, higher speed
NumBlades Positive More blades, higher speed
Turbo Positive Stronger engine, higher speed
13.29 The estimated regression equation is ŷ = 92.4431 + 0.1787TotalHP + 8.8269NumBlades +

15.9752Turbo −0.3927Age. The signs on the coefficients match our a priori
reasoning.
MegaStat Output:
Regression Analysis
R² 0.768
R 0.876 k 4
Std. Error 18.097 Dep. Var. Cruise
ANOVA table
54,232.905
Regression 0 4 13,558.2262 41.40 2.75E-15
16,375.204
Residual 1 50 327.5041
70,608.109
Total 1 54
95% 95%
variables coefficients std. error t (df=50) p-value lower upper
Intercept 92.4431 13.2196 6.993 6.16E-09 65.8907 118.9955
TotalHP 0.1787 0.0195 9.167 2.76E-12 0.1396 0.2179
NumBlades 8.8269 5.7530 1.534 .1313 -2.7284 20.3823
Turbo 15.9752 6.2959 2.537 .0143 3.3296 28.6208
Age -0.3927 0.1991 -1.972 .0541 -0.7927 0.0073
13-36

except for the variables TotalHP and Turbo. Only TotalHP and Turbo are significant
predictors at  = .05.
t.025 =T.INV(.025, 50) =±2.009. The predictor TotalHP has a tcalc = 9.167 and Turbo
has a tcalc = 2.537 which are both greater than 2.009 therefore we reject the null
hypotheses and conclude their coefficients are not equal to zero.
13.32 a. TotalHP: p-value = 2.76E-12 and Turbo: p-value =.0143. Both p-values are less than .
05. Note that the variable Age has a p-value = .0541 which is significant at  = .10
significant fit with a fairly strong prediction of aircraft cruising speed.
13.34 Prediction interval: yî ± t.025se = yî ± 2.009(18.097) = yî ± 36.357. Yes, this model does
have practical value.
13.35 a.
NumBlade
TotalHP s Turbo Age
TotalHP 1.000
NumBlade
s .491 1.000
Turbo .096 .388 1.000
Age .154 -.180 -.030 1.000
sample
55 size

13-37
b. Collinearity should not be a problemaccording to Klein’s Rule. The correlation

coefficient values are all less than R 2 .
13.36 a.
variables VIF
Intercept
Age 1.131
TotalHP 1.459
NumBlades 1.716
Turbo 1.201
b. The VIF values are all under 2 which suggest that multicollinearity has not caused
instability.
13.37 Aircraft 23, 39 and 46 have unusual residual values.

13.38 Observations 3, 8, 43, and 46 have high leverage values. To determine if any of the
13.39 Assuming normally distributed residuals appears reasonable.
13-38
13.40 Assuming homoscedastic residuals appears reasonable.

13-39
DATA SET G Response Variable: Chromatographic Retention Time
13.25 Cross-sectional. Unit of observation: chemical compound.


13.27 The intercept would not have meaning. It would not be logical to for a particular compound
to have zero values for any of the predictors. A priori hypotheses for the relationship
between each predictor and the response variable are listed in the table below.
Predictor Relationship with
Response
MW Negative
BP Positive
RI Negative
H1 Negative
H2 Negative
H3 Negative
H4 Negative
13.28 35/7 = 5. The data set meets Doane’s Rule but not Evans’.
13.29 The estimated regression equation is ŷ = 51.3827 − 0.1772MW + 1.4901BP − 13.1620RI −

13.8067H1 − 6.4334H2 − 12.2297H3 − 0.5823H4. The signs on the coefficients
match our a priori reasoning.
13-40
MegaStat Output:
Regression Analysis
R² 0.987
R 0.993 k 7
Std. Error 8.571 Dep. Var. Ret
ANOVA table
Regression 146,878.2005 7 20,982.6001 285.64 1.27E-23
Residual 1,983.3648 27 73.4580
Total 148,861.5653 34

Intercept 51.3827 162.7418 0.316 .7546 -282.5359 385.3012
MW -0.1772 0.3083 -0.575 .5701 -0.8097 0.4553
BP 1.4901 0.1831 8.139 9.64E-09 1.1144 1.8657
RI -13.1620 107.2293 -0.123 .9032 -233.1784 206.8544
H1 -13.8067 9.7452 -1.417 .1680 -33.8022 6.1888
H2 -6.4334 8.6848 -0.741 .4652 -24.2531 11.3864
H3 -12.2297 8.1138 -1.507 .1434 -28.8779 4.4184
H4 -0.5823 4.8499 -0.120 .9053 -10.5335 9.3689
except for the variable BP (Boiling Point). Only BP is a significant predictor at  = .
05.
t.025 =T.INV(.025,27) =±2.052. The predictor BP has a tcalc = 8.139.
13.32 a. BP: p-value = 9.64E-09.

significant fit with a fairly strong prediction of compound retention time. However,
residual assumptions should be verified.
13-41
13.34 Prediction interval: yî ± t.025se = yî ± 2.052(8.571) = yî ± 17.588 Yes, this model does have
practical value, provided the residual assumptions are verified.
13.35 a.
MW BP RI H1 H2 H3 H4
MW 1.000
BP .906 1.000
RI .240 .580 1.000
H1 .065 -.218 -.747 1.000
H2 -.233 -.336 -.202 -.194 1.000
H3 -.214 -.185 -.145 -.194 -.094 1.000
H4 .117 .167 .202 -.316 -.153 -.153 1.000
35sample size

b. Collinearity could be a problem. The correlation coefficient for BP and MW is close

to .987 = .9935.
13.36 a.
variables VIF
Intercept
MW 21.409
BP 31.113
RI 13.115
H1 9.235
H2 2.816
H3 2.458
H4 1.793
b. The VIF values MW, BP, RI, and H1 are all high. It is possible that they are causing
variance inflation which could cause some instability. If the regression analysis is
rerun using only BP as the predictor variable, very little changes in the fit statistics
and the scatterplot shows a very strong linear relationship.
13.37 Observations 15, 17, and 25 have unusual residual values.

13-42
13.38 Observation 24 has a high leverage value. To determine if any of the observations were
influential, remove them from the data set and rerun the regression. If the regression
statistics change significantly then the observation could be considered influential.
13.39 While the normplot and histogram are not a perfect representation of a normal distribution,
there are no strong departures from normality. Assuming normally distributed
residuals appears reasonable.
13-43
13.40 The model appears to underestimate retention at the low end and the high end of the range
of values. It might be prudent to explore a nonlinear relationship with retention and
BP.

13-44
DATA SET H Response Variable: 2007 State Foreclosure Rate
13.25 Cross-sectional. Unit of observation: state.


13.27 The intercept would not have meaning. It would not be logical to for a particular state to
have zero values for any of the predictors. A priori hypotheses for the relationship
between each predictor and the response variable are listed in the table below.
Response
MassLayof Positive
SubprimeShare Positive
PriceIncomeRat
o Positive
Homeownership Negative
5YrApp Positive
UnempChange Positive
%HousMoved Positive
13.29 The estimated regression equation is ŷ = 51.2829 + 0.0751MassLayoff +

18.5385SubPrimeShare − .6965PriceIncomeRatio − .0587Homeownership − .
04415YrApp–16.4618UnEmpChange – 99.2433%HouseMoved. The signs on the
coefficients match our a priori logic for only Mass Layoffs and Share of mortgages
that were subprime
13-45
MegaStat Output:
Regression Analysis
R² 0.739
R 0.860 k 7
Std. Error 3.732 Dep. Var. Foreclosure
ANOVA table
Regression 1,657.8345 7 236.8335 17.01 2.01E-10
Residual 584.9417 42 13.9272
Total 2,242.7762 49

Intercept 51.2829 13.8099 3.713 .0006 23.4134 79.1524
MassLayoff 0.0751 0.2056 0.365 .7167 -0.3398 0.4900
SubprimeShare 18.5385 17.7785 1.043 .3030 -17.3400 54.4169
PriceIncomeRatio -0.6965 0.7438 -0.936 .3544 -2.1976 0.8045
Homeownership -0.0587 0.1309 -0.448 .6561 -0.3228 0.2054
5YrApp 0.0441 0.0343 1.284 .2062 -0.0252 0.1134
UnempChange 16.4618 7.7223 2.132 .0389 0.8775 32.0461
%HousMoved -99.2433 14.3588 -6.912 1.94E-08 -128.2206 -70.2661
except for the variables UnempChange and %HousMoved. These variables were the
only two significant at  = .05.
t.025 =T.INV(.025,42) =±2.018. The predictors UnempChange and %HousMoved have
tstatistics equal to 2.132 > 2.018 and 6.912<2.018, respectively.
13.32 a. UnempChange: p-value = .0389 and %HousMoved: p-value = 1.94E-08.

significant fit with a fairly strong prediction of foreclosure rates. However, residual
assumptions should be verified.
13-46
practical value, provided the residual assumptions are verified.
13.35 a.
Subprime PriceIncom Home Unemp %Hous
MassLayoff Share Ratio ownership 5YrApp Change Moved
MassLayoff 1.000
SubprimeShare -.022 1.000
PriceIncomeRatio .073 -.119 1.000
Homeownership -.045 .130 -.501 1.000
5YrApp .045 -.105 .786 -.434 1.000
UnempChange .280 .124 .200 -.066 .211 1.000
%HousMoved -.144 -.507 -.314 .229 -.119 -.143 1.000
50sample size

b. Collinearity might be a problem. There are several pairs of variables that have
significant correlation. In particular 5YrApp and PriceIncomRatio have an rclose in
value to .739 = .860.
13.36 a.
variables VIF
Intercept
MassLayoff 1.123
SubprimeShare 1.653
PriceIncomeRatio 3.447
Homeownership 1.405
5YrApp 2.868
UnempChange 1.170
%HousMoved 1.901
b. The VIF values are all 4 or less. Concern about multicollinearity is not high.
13.37 Colorado, Nevada, and Vermont have outlier/unusual residual values.

13-47
13.38 California, Mississippi, Nevada, and Vermont have high leverage values. To determine if
any of the observations were influential, remove them from the data set and rerun the
regression. If the regression statistics change significantly then the observation could
be considered influential.
13.39 Residuals do not appear to be normally distributed. A histogram shows a right skewed
distribution.
13-48
13.40 The residual plot shows some indication that the model is overestimating foreclosure rates
in the middle range of values. A plot of residuals against unemployment shows
heteroscedasticity.

13-49
DATA SET I Response Variable: Body Fat %
13.25 Cross-sectional. Unit of observation: an individual male.


13.27 The intercept would not have meaning. A priori logic for the relationship between each
Response
Age Positive
Weight Positive
Height Neutral
Neck Positive
Chest Positive
Abdomen Positive
Hip Positive
Thigh Positive
13.29 The estimated regression equation is y = −35.4309 +0.0905Age − 0.1928Weight −

0.0642Height − 0.3348Neck+ 0.0239Chest + 0.9132Abdomen−0.3107Hip
+0.7787Thigh. The signs on the coefficients match our a priori logic for Age, Chest,
and Abdomen only.
MegaStat Output:
Regression Analysis
R² 0.841
R 0.917 k 8
Std. Error 3.957 Dep. Var. Fat%
ANOVA table
Regression 3,399.1446 8 424.8931 27.14 4.82E-14
Residual 641.8882 41 15.6558
Total 4,041.0328 49

Intercept -35.4309 24.9040 -1.423 .1624 -85.7256 14.8638
13-50
Age 0.0905 0.0880 1.028 .3099 -0.0872 0.2682

Weight -0.1928 0.0783 -2.462 .0181 -0.3510 -0.0346
Height -0.0642 0.1160 -0.554 .5827 -0.2984 0.1700
Neck -0.3348 0.4023 -0.832 .4100 -1.1472 0.4776
Chest 0.0239 0.1788 0.133 .8945 -0.3373 0.3850
Abdomen 0.9132 0.1640 5.570 1.77E-06 0.5821 1.2444
Hip -0.3107 0.2749 -1.130 .2649 -0.8658 0.2445
Thigh 0.7787 0.2907 2.678 .0106 0.1915 1.3658
13.30 Refer to the output in question 13.29. The 95% coefficient confidence intervals contain
zero except for the variables Weight, Abdomen, and Thigh. This means that these three
variables are the only significant predictors in the model.
t.025 =T.INV(.025,41) =±2.020. Weight, Abdomen, and Thigh have t statistics equal to
2.462<2.02, 5.570> 2.02, and 2.678> 2.02, and are all significant predictors.
13.32 a. Weight: p-value = .0181, Abdomen:p-value = 1.77E-06, and Thigh Weight: p-value = .
0106.
significant fit with a fairly strong prediction of percent body fat.
practical value.
13-51
13.35 a.
Age Weight Height Neck Chest Abdomen Hip

Age 1.000
Weight .265 1.000
Height -.276 .109 1.000
Neck .176 .882 .201 1.000
Chest .376 .912 .014 .820 1.000
Abdomen .442 .915 -.052 .781 .942 1.000
Hip .314 .959 -.045 .804 .911 .942 1.000
50sample size

b. According to Klein’s Rule there are many pairs of data that are causing concern for
collinearity. The correlation coefficient values are greater than .841 = .917. .
13.36 a.
variables VIF
Intercept
Age 1.712
Weight 31.111
Height 1.689
Neck 5.472
Chest 11.275
Abdomen 17.714
Hip 25.899
Thigh 11.931
b. The VIF values are high except for Age, Height, and Neck. Multicollinearity is a
concern.
13.37 There are no unusual standardized residuals.

13.38 Observation 5, 15, 36, 39, 42 had high leverage values. To determine if any of the
13-52
13.39 Assuming normally distributed residuals appears reasonable although the histogram
appears slightly right skewed.
13-53
13.40 Assuming homoscedastic residuals appears reasonable.

13-54
DATA SET J Response Variable: Used Vehicle Price
13.25 Cross-sectional. Unit of observation: vehicle model.

13.26 The variable magnitudes are different. The response variable magnitude is in tens of
thousands and the predictor variables are integers between 0 and 50.
13.27 The intercept would not have meaning. It would not be logical to have a car with zero
Response
Age Negative Older car, lower price
Cars are less expensive than Vans which is
Car Negative the base indicator variable
Trucks are more expensive than Vans which is
Truck Positive the base indicator variable
SUVs are more expensive than Vans which is
SUV Positive the base indicator variable
13.29 The estimated regression equation is ŷ = 15,340.7233 − 693.9768Age − 533.5731Car +

5,748.1799Truck + 3,897.5375SUV. The signs on the coefficients match our a priori
reasoning.
13-55
MegaStat Output:
Regression Analysis
R² 0.139
R 0.373 k 4
Std. Error 8573.178 Dep. Var. Price
ANOVA table
Regression 7,512,691,865.5390 4 1,878,172,966.3848 25.55 1.20E-19
Residual 46,451,606,047.1737 632 73,499,376.6569
Total 53,964,297,912.7127 636

Intercept 15,340.7233 1,239.0560 12.381 1.12E-31 12,907.5585 17,773.8881
Age -693.9768 117.6801 -5.897 6.02E-09 -925.0680 -462.8855
Car -533.5731 1,225.8598 -0.435 .6635 -2,940.8241 1,873.6780
Truck 5,748.1799 1,318.6111 4.359 1.52E-05 3,158.7909 8,337.5689
SUV 3,897.5375 1,315.4861 2.963 .0032 1,314.2852 6,480.7899
13.30 Refer to the output in question 13.29. The 95% coefficient confidence interval for the
indicator variable Car contains zero. The other three variables are significant
predictors in the model.
t.025 =T.INV(.025,632) =±1.964. Age,Truck, and SUV have t statistic values equal to
5.897, 4.359, and 2.963, respectively.
13.32 a. Age: p-value = 6.02E-09, Truck: p-value = 1.52E-05, and SUV: p-value = .0032. All
three p-values are less than .05
significant fit but the model is not a strong predictive equation.
13-56
13.34 Prediction interval: yî ± t.025se = yî ± 1.964(8573.178) = yî ± 16837.722. No, this model does
not have practical value.
13.35 a.
Correlation Matrix
Age Car Truck SUV

Age 1.000
Car .003 1.000
Truck .017 -.478 1.000
SUV -.092 -.495 -.308 1.000
637sample size

b. There is no concern for collinearity. While there is significant correlation between the
indicator variables, this is to be expected because of the way they are defined. We
would expect to see correlation between indicator variables defined on the same
characteristic.
13.36 a.
variables VIF
Intercept
Age 1.017
Car 3.201
Truck 2.662
SUV 2.749
b. The VIF values are all 4 or less which suggests that multicollinearity has not caused
instability.
13.37 Vehicles 212, 342, and 502 were extremely high outlier residuals. Vehicles 246, 397, 631,
and 632 were unusual/outlier residuals. The next step would be to investigate the
three high outlier residuals for possible exclusion from the data set. It is possible their
prices were mistyped or the vehicles do not fit the profile of a vehicle for which the
model is being developed.
13.38 There were many observations with high leverage, too many to list here. To determine if
any of the observations were influential, remove them from the data set and rerun the
regression. If the regression statistics change significantly then the observation could
be considered influential.
13-57
13.39 Residuals do not appear to be normally distributed. Outliers should be investigated for
possible removal from data set.
13-58
13.40 Assuming homoscedastic residuals is not reasonable. There are obviously outliers in the
data set. Removing the outliers and rerunning the analysis will most likely show
heteroscedasticity in the residual plot.

13-59
DATA SET K Response Variable: 500 yard freestyle time
13.25 Cross-sectional. Unit of observation: individual swimmer.


13.27 The intercept would not have meaning. It would not be logical to have a freestyle time
when the age of the swimmer is zero. A priori reasoning for the relationship between
each predictor and the response variable are listed in the table below.
Response
The lowest seeded swimmers should have
Seed Positive the lowest times and vice versa
If Gender = 1 indicates female it is possible
Gender Positive the women have slower times than men
The older a swimmer is the slower their times
Age Positive will be
13.28 198/3 = 66. The data set meets both Evans’ and Doane’s Rules.
13.29 The estimated regression equation is ŷ = −35.3401+ 0.9286Seed +13.4401Gender +

0.8105Age. The signs on the coefficients match our a priori reasoning.
MegaStat Output:
Regression Analysis
R² 0.942
R 0.970 k 3
Std. Error 39.015 Dep. Var. Time
ANOVA
table
4,755,790.720
Regression 5 3 1,585,263.5735 1041.45 2.64E-119
Residual 295,300.5055 194 1,522.1676
5,051,091.226
Total 0 197

95% 95%
variables coefficients std. error t (df=194) p-value lower upper
Intercept -35.3401 21.4123 -1.650 .1005 -77.5708 6.8907
Seed 0.9286 0.0251 36.985 8.29E-90 0.8791 0.9781
Gender 13.4401 6.6282 2.028 .0440 0.3675 26.5126
Age 0.8105 0.4042 2.005 .0463 0.0134 1.6076
13-60

13.30 Refer to the output in question 13.29. None of the 95% coefficient confidence interval for
all three variables contain zero.All three variables are significant predictors in the
model.
t.025 =T.INV(.025,194) = ±1.972. All variables have t statistic values greater than
1.972 therefore all predictors are significant.
13.32 a. Seed: p-value = 8.29E-90,Gender: p-value = .0440, and Age: p-value = .0463. All
three p-values are less than .05
significant fit and is a strong predictor of finishing times.
have practical value.
13.35 a.
Correlation Matrix
Seed Gender Age

Seed 1.000
Gender .347 1.000
Age .580 -.136 1.000
198 sample size
critical value .05 (two-

± .139 tail)
critical value .01 (two-
± .183 tail)
b. There is no concern for collinearity. While there is significant correlation between the
indicator variables, the correlation is not high enough to cause concern.
13-61
13.36 a.
variables VIF
Seed 2.086
Gender 1.411
Age 1.870
b. The VIF values are all 3 or less which suggests that multicollinearity has not caused
instability.
13.37 Seven swimmers had residuals that were either unusual (greater than 2 but less than 3) or
outliers (greater than 3.) This means the model underestimated their finishing times.
Six swimmers had residuals that were either unusual (less than −2 but greater than
−3) or outliers (less than −3.) This means the model overestimated their finishing
times.
13.38 There were nine observations with high leverage. To determine if any of the observations
were influential, remove them from the data set and rerun the regression. If the
regression statistics change significantly then the observation could be considered
influential.
13.39 Residuals do not appear to be normally distributed. Outliers should be investigated for
possible removal from data set. The histogram shows outliers on both the low and
high end.
13-62
13.40 Assuming homoscedastic residuals is not reasonable. There is a clear fan out pattern. It
appears that as the seed time increases, the variation in residuals increases.
13-63

13-64
13.42 a. Each coefficient measures the additional revenue earned by selling one more unit
(one more car, truck, or SUV,respectively.)
b. The intercept is not meaningful. Ford has to sell at least one car, truck or SUVto earn
revenue. No sales means no revenue.
c. The error term might consist of factors such as the price of fuel, which heavily
influences vehicle sales, and the state of the economy. Sales are lower when the
economy is unpredictable because people hold onto their cars longer. In addition, the
predictor variables are highly correlated to each other (multicollinearity problem), as
well as related to “missing variables” that influence their sales as well as revenue.
13.43 There are no quantitative predictor variables. A better approach would be to use an ANOVA
procedure that compares means within groups. In addition, the sample size is too
small relative to the number of predictors. There would have to be 6 binary variables
to cover the suppliers and substrate categories. With only 11 observations this would
violate both Evans’ Rule and Doane’s Rule.
13.44 a. One binary must be omitted to prevent perfect multicollinearity.

b. Same reasoning as in (a). The effect of the missing category is captured in the model
intercept.
c. Monday is the busiest day. The coefficient on Monday is positive meaning the
occupancy rates go up that day relative to the base day of Sunday. All other days had
negative coefficients.
d. Shift 3 is captured in the intercept. Both Shift 1 and Shift 2 have lower
AvgOccupancy given that they have negative coefficients.
e. The intercept represents the AvgOccupancy on Sundays during Shift 3.
f. The fit is poor because R2 = .094.
13.45 Main points:

1. The regression as a whole is not significant based on the Fcalcp-value = .3710
2. R2 = 0.117 indicating a very poor fit.
3. Examination of the individual regression coefficients indicates that the two binary
variables are not significantly different from zero, p-values >.10.
4. Conclusion: cost per average load does not differ based on whether or not it is a
top-load washer or whether or not powder was used. No apparent cost savings based
on washer type or detergent type.
13.46 Main points:

1. The best model in terms of fit as measured by s,R2, andR2adjis the model with three
variables (InfMort, GDPCap, and Literate). Note that the three variable model is only
13-65
marginally better in terms of fit statistics than the one or two variable models. No
gain in fit is achieved by adding LifeExp and Density.
2. Examination of the individual regression coefficients indicates that the InfMort and
Literate have p-values < .01 and GDPCap has a p-value < 0.05.
3. Conclusion: Infant mortality and literacy rate have the greatest impact on birth
rates.
13.47 a. Yes, the coefficients make sense, except for TrnOvr. One would think that turnovers
would actually reduce the number of wins, not increase them.
b. No. It is negative and the number of games won is limited to zero or greater.
c. One needs either 5 observations or 10 observations per predictor. There are 6
predictor variables in the model, so we need a minimum of 30 observations to meet
Doane’s rule. The fact there were only 23 teams and therefore only 23 observations
means the sample size was probably too small to make the model reliable.
d. Rebounds and points are highly correlated. We don’t need both of them in the model.
This could be inflating in the variance of the predictor estimates, causing the
predictors to appear insignificant.
13.48 Main points:

1. The regression as a whole indicates a very strong fit.
2. R2=.811. The predictor variables as a group explain 81.1% of the variation in
Salary.
3. Examination of the individual regression coefficients indicates that all of the
variables are significantly different from zero, p-values <0.01
4. Conclusion: The ethnicity of a professor does matter. A professor who is African-
American earns on average $2,093 less than one who is not. Assistant professors earn
on average $6,438 less than higher ranking professors. New hires earn less than those
who have been there for some time. To stay competitive universities often have to
offer high salaries to the top candidates so this seems counter-intuitive.
13.49 a. Both men and women who had prior marathon experience had lower times on
average than those who were running for the first time.
b. No the intercept does not have any meaning. If all predictor/binary variables were 0
then you wouldn’t have an individual racer.
c. It is suspected non-linearity is present among age, weight, and height. In this model
we see increases in age decreases times, but at an increasing rate, increases in weight
decreases time, but at an increasing rate and increasing height increases time, but at a
decreasing rate.
d. The model predicted that I would run the marathon in about 12 and ½ hours. And
that could be right. I can walk 4 mph so it would take at least 6 to 7 hours minimum!
13-66
13.50 The three predictors of CityMPG are most likely strongly correlated with each other. The
VIF values do not show any concern (all less than 3) but we see that the variables
Length and Width are not significant predictors of gas mileage. The variable Weight is
a significant predictor (p-value = .0000)and according to the R2 value of .682,
explains approximately 68% of the variation in CityMPG.
13.51 While the four predictor model gives the highest R2 (.474) and lowest standard error
(143.362), the predictor Divorce is not significant. The decrease in R2(.454) after
removing Divorce is quite small so the three predictor model would be the best
choice.
13-67

Chap 013

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chap 013

Uploaded by

Copyright:

Available Formats

Chapter 13 - Multiple Regression

13.2 a. ŷ = 1225.44 + 11.52FloorSpace−6.935CompetingAds−0.1496Price

13.3 a. ŷ = 2.8931 + 0.1542LiftWait + 0.2495AmountGroomed + 0.0539SkiPatrolVisibility

13.4 a. ŷ = 4198.5808 27.3540AgeMed + 17.4893Bankrupt 0.0124FedSpend

13.5 a. df1 = 4 and df2 = 45

13.6 a. df1 = 3 and df2 = 26

13.7 a. df1 = 4 and df2 = 497

13.8 a. df1 = 4 and df2 = 45

Learning Objective: 13-2

PredictorCoef, bj sj tcalc p-value

b. t.005=T.INV(.005,26) =±2.779.Only Floor Space differs significantly from zero (p-

Predictor Coef, bj sj tcalc p-value

b. t.005 =T.INV(.005,497) =±2.69. Coefficients on LiftWait and AmountGroomed differ

Predictor Coef, bj sj tcalc p-value

b. t.005 =T.INV(.005,45) =± 2.586. Only the coefficient on HSGrad% differs significantly

13.16 a. Weight of stone (continuous)

c.Price = 0 + 1Weight + 2ColorD + 3ColorE +4ColorF +5ColorG +6ColorH

13.17 a. ln(Price) = 5.4841  0.0733SalePrice + 1.1196Sub-Zero + 0.0696Capacity +

13.18 a. SentenceLength = 3.2563 + 0.5219Age + 7.7412Convictions

Regression output confidence interval

Regression output confidence interval

Learning Objective: 13-3

± .088 critical value .05 (two-tail)

b. All VIFs are less than 2. No cause for concern.

Learning Objective: 13-6

13.22 a. Al and Si (r = .456), Cr and Zn (r = .529) are significantly correlated at an  = .01. Al

± .344 critical value .05 (two-tail)

b. All VIFs are low in value. No cause for concern.

Learning Objective: 13-6

DATA SET AResponse Variable: Vehicle City Mileage

13.25 Cross-sectional. Unit of observation: vehicle model.

13.29 The estimated regression equation is ŷ = 43.9932 − 0.0039length − 0.1064width −

13.32 a. Weight: p-value = 1.53E-05 < .05.

Length Width Weight Japan

± .301 critical value .05 (two-tail)

Learning Objective: 13-7

Learning Objective: 13-7

13.41 This is cross-sectional data. A test for autocorrelation is not warranted.

DATA SETB:Response Variable: Noodles& Company Sales/SqFt

13.25 Cross-sectional. Unit of observation: restaurant.

13.26 The variable magnitudes are all similar.

13.29 The estimated regression equation is ŷ = 429.5114 − 1.8149Seats-Inside + 1.2719Seats-

Regression output confidence interval

Learning Objective: 13-1

Seats-Inside Seats-Patio MedIncome MedAge BachDeg%

± .229 critical value .05 (two-tail)

b. Both MedAge and BachDeg% are significantly correlated with MedIncome.

13.39 Assuming normally distributed residuals appears reasonable.

Learning Objective: 13-7

Learning Objective: 13-7

13.41 This is cross-sectional data. A test for autocorrelation is not warranted.

DATA SET C Response Variable: Medical Office Building Assessed Value

13.25 Cross-sectional. Unit of observation: office building.

13.29 The estimated regression equation is ŷ = −59.3894 + 0.2509Floor + 97.7927Offices +

Regression output confidence interval

Offices 97.7927 30.8056 3.175 .0038 34.4709 161.1145

13.32 a. Floor:p-value = 1.08E-11, Offices: p-value = .0038, and FreewayWeight: p-value = .

Floor Offices Entrances Age Freeway

± .349 critical value .05 (two-tail)

13.38 There were no high leverage values.

Learning Objective: 13-7

Learning Objective: 13-7

13.41 This is cross-sectional data. A test for autocorrelation is not warranted.