CH 11
CH 11
Section 11-2
11-1. a) yi = β 0 + β 1 xi + ε i
2
S xx = 157.42 − 4314 = 25.348571
S xy = 1697.80 − 43(14572 ) = −59.057143
S xy −59.057143
β1 = = = −2.330
S xx 25.348571
β 0 = y − β1 x = 572 43 ) = 48.013
− ( −2.3298017)( 14
14
b) yˆ = βˆ 0 + βˆ1 x
yˆ = 48.012962 − 2.3298017(4.3) = 37.99
c) yˆ = 48.012962 − 2.3298017(3.7) = 39.39
d) e = y − yˆ = 46.1 − 39.39 = 6.71
11-2. a) yi = β 0 + β1 xi + ε i
2
S xx = 143215.8 − 1478
20
= 33991.6
(1478 )(12.75 )
S xy = 1083.67 − 20
= 141.445
S xy 141.445
βˆ1 = = = 0.00416
S xx 33991.6
βˆ0 = 1220.75 − (0.0041617512)( 1478
20 ) = 0.32999
yˆ = 0.32999 + 0.00416 x
SS E 0.143275
σˆ 2 = MS E = = = 0.00796
n−2 18
0.8
0.7
0.6
0.5
y 0.4
0.3
0.2
0.1
0
-50 0 50 100
d) βˆ = 0.00416
1
11-1
11-3. a) yˆ = 0.3299892 + 0.0041612( 95 x + 32)
yˆ = 0.3299892 + 0.0074902 x + 0.1331584
yˆ = 0.4631476 + 0.0074902 x
b) βˆ = 0.00749
1
11-4. a)
Regression Analysis - Linear model: Y = a+bX
Dependent variable: Games Independent variable: Yards
--------------------------------------------------------------------------------
Standard T Prob.
Parameter Estimate Error Value Level
Intercept 21.7883 2.69623 8.081 .00000
Slope -7.0251E-3 1.25965E-3 -5.57703 .00001
--------------------------------------------------------------------------------
Analysis of Variance
Source Sum of Squares Df Mean Square F-Ratio Prob. Level
Model 178.09231 1 178.09231 31.1032 .00001
Residual 148.87197 26 5.72585
--------------------------------------------------------------------------------
Total (Corr.) 326.96429 27
Correlation Coefficient = -0.738027 R-squared = 54.47 percent
Stnd. Error of Est. = 2.39287
σˆ 2 = 5.7258
If the calculations were to be done by hand use Equations (11-7) and (11-8).
Regression Plot
y = 21.7883 - 0.0070251 x
10
y
11-2
11-5. a)
Regression Analysis - Linear model: Y = a+bX
Dependent variable: SalePrice Independent variable: Taxes
--------------------------------------------------------------------------------
Standard T Prob.
Parameter Estimate Error Value Level
Intercept 13.3202 2.57172 5.17948 .00003
Slope 3.32437 0.390276 8.518 .00000
--------------------------------------------------------------------------------
Analysis of Variance
Source Sum of Squares Df Mean Square F-Ratio Prob. Level
Model 636.15569 1 636.15569 72.5563 .00000
Residual 192.89056 22 8.76775
--------------------------------------------------------------------------------
Total (Corr.) 829.04625 23
Correlation Coefficient = 0.875976 R-squared = 76.73 percent
Stnd. Error of Est. = 2.96104
σˆ 2 = 8.76775
If the calculations were to be done by hand use Equations (11-7) and (11-8).
yˆ = 13.3202 + 3.32437 x
b) yˆ = 13.3202 + 3.32437(7.5) = 38.253
50
45
Predicted
40
35
30
25
25 30 35 40 45 50
Observed
11-3
11-6. a)
Regression Analysis - Linear model: Y = a+bX
Dependent variable: Usage Independent variable: Temperature
--------------------------------------------------------------------------------
Standard T Prob.
Parameter Estimate Error Value Level
Intercept -6.3355 1.66765 -3.79906 .00349
Slope 9.20836 0.0337744 272.643 .00000
--------------------------------------------------------------------------------
Analysis of Variance
Source Sum of Squares Df Mean Square F-Ratio Prob. Level
Model 280583.12 1 280583.12 74334.4 .00000
Residual 37.746089 10 3.774609
--------------------------------------------------------------------------------
Total (Corr.) 280620.87 11
Correlation Coefficient = 0.999933 R-squared = 99.99 percent
Stnd. Error of Est. = 1.94284
σˆ 2 = 3.7746
If the calculations were to be done by hand use Equations (11-7) and (11-8).
yˆ = −6.3355 + 9.20836 x
b) yˆ = −6.3355 + 9.20836(55) = 500.124
c) If monthly temperature increases by 1 F, y increases by 9.20836.
d) yˆ = −6.3355 + 9.20836(47) = 426.458
y = 426.458
e = y − yˆ = 424.84 − 426.458 = −1.618
11-7. a)
Predictor Coef StDev T P
Constant 33.535 2.614 12.83 0.000
x -0.03540 0.01663 -2.13 0.047
σˆ 2 = 13.392
yˆ = 33.5348 − 0.0353971x
b) yˆ = 33.5348 − 0.0353971(150) = 28.226
c) yˆ = 29.4995
e = y − yˆ = 31.0 − 29.4995 = 1.50048
11-4
11-8. a)
60
50
y
40
11-9. a)
9
8
7
6
5
y
4
3
2
1
0
60 70 80 90 100
x
Yes, a linear regression would seem appropriate, but one or two points appear to be outliers.
Predictor Coef SE Coef T P
Constant -10.132 1.995 -5.08 0.000
x 0.17429 0.02383 7.31 0.000
S = 1.318 R-Sq = 74.8% R-Sq(adj) = 73.4%
Analysis of Variance
Source DF SS MS F P
Regression 1 92.934 92.934 53.50 0.000
Residual Error 18 31.266 1.737
Total 19 124.200
11-5
11-10. a)
250
200
y
150
100
0 10 20 30 40
x
11-11. a)
40
30
y
20
10
0
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8
x
Yes, a simple linear regression model seems appropriate for these data.
Predictor Coef StDev T P
Constant 0.470 1.936 0.24 0.811
x 20.567 2.142 9.60 0.000
S = 3.716 R-Sq = 85.2% R-Sq(adj) = 84.3%
Analysis of Variance
Source DF SS MS F P
Regression 1 1273.5 1273.5 92.22 0.000
Error 16 220.9 13.8
Total 17 1494.5
11-6
b) σˆ 2 = 13.81
yˆ = 0.470467 + 20.5673 x
d) yˆ = 10.1371 e = 1.6629
11-12. a)
2600
y
2100
1600
0 5 10 15 20 25
x
Yes, a simple linear regression (straight-line) model seems plausible for this situation.
b) σˆ 2 = 9811.2
yˆ = 2625.39 − 36.962 x
c) yˆ = 2625.39 − 36.962(20) = 1886.15
d) If there were no error, the values would all lie along the 45 axis. The plot indicates age was reasonable
regressor variable.
2600
2500
2400
2300
2200
FITS1
2100
2000
1900
1800
1700
11-7
11-13. βˆ0 + βˆ1 x = ( y − βˆ1 x ) + βˆ1 x = y
11-14. a) The slopes of both regression models will be the same, but the intercept will be shifted.
b) yˆ = 2132.41 − 36.9618 x
βˆ 0 = 2625.39 βˆ 0 ∗ = 2132.41
vs.
βˆ1 = −36.9618 βˆ 1 ∗ = −36.9618
∗
11-15. Let x i = x i − x . Then, the model is Yi ∗ = β 0∗ + β1∗ xi∗ + ε i .
n n
Equations 11-7 and 11-8 can be applied to the new variables using the facts that xi∗ = yi∗ = 0 . Then,
i =1 i =1
βˆ1∗ = βˆ 1 and βˆ 0∗ = 0 .
11-16. The least squares estimate minimizes ( yi − βxi ) 2 . Upon setting the derivative equal to zero, we obtain
2
2 ( yi − βxi ) (− xi ) = 2[ yi xi − β xi ] = 0
yi xi
Therefore, βˆ = 2
.
xi
11-17. yˆ = 21.031461x . The model seems very appropriate - an even better fit.
45
40
35
30
25
chloride
20
15
10
0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
watershed
11-8
Section 11-5
c) se ( βˆ ) = 1 x2 1 3 . 0714 2
0 σˆ 2 + = 1 . 8436 + = 0 . 9043
n S xx 14 25 . 3486
11-19. a) 1) The parameter of interest is the regressor variable coefficient, β1.
2) H 0 :β1 = 0
3) H 1:β1 ≠ 0
4) α = 0.05
5) The test statistic is
MS R SS R / 1
f0 = =
MS E SS E /( n − 2 )
= 0.143275
0.5886
f0 = = 73.95
0.143275 / 18
8) Since 73.95 > 4.414, reject H 0 and conclude the model specifies a useful relationship at α = 0.05.
11-9
P − value ≅ 0.000001
σˆ 2 .00796
b) se( βˆ1 ) = = = 4.8391x10 − 4
S xx 33991.6
1 x 1 73.9 2
se( βˆ 0 ) = σˆ 2 + = .00796 + = 0.04091
n S xx 20 33991.6
b) se ( βˆ ) =
σˆ 2 5 . 7257
1 = = . 001259
S xx 3608611 . 43
ˆ )= 1 x 1 2110 .13 2
se ( β 0
ˆ2
σ + = 5 .7257 + = 2 .6962
n S xx 28 3608611 .43
c) 1) The parameter of interest is the regressor variable coefficient, β1.
2) H 0 : β1 = − 0 .01
3) H 1 : β1 ≠ −0 .01
4) α = 0.01
ˆ + .01
5) The test statistic is t = β 1
0 ˆ )
se( β1
11-10
11-21. Refer to ANOVA of Exercise 11-5
a) 1) The parameter of interest is the regressor variable coefficient, β1.
2) H 0 :β1 = 0
3) H 1:β1 ≠ 0
4) α = 0.05, using t-test
β1
5) The test statistic is t 0 =
se(β1)
6) Reject H0 if t0 < −tα/2,n-2 where −t0.025,22 = −2.074 or t0 > t0.025,22 = 2.074
7) Using the results from Exercise 11-5
3.32437
t0 = = 8.518
0.390276
8) Since 8.518 > 2.074 reject H 0 and conclude the model is useful α = 0.05.
ˆ )= 1 x 1 6 . 4049 2
se ( β 0
ˆ2
σ + = 8 . 7675 + = 2 . 5717
n S xx 24 57 . 5631
11-11
11-22. Refer to ANOVA for Exercise 10-6
a) 1) The parameter of interest is the regressor variable coefficient, β1.
2) H 0 :β1 = 0
3) H 1:β1 ≠ 0
4) α = 0.01
MS R SS R / 1
5) The test statistic is f0 = =
MS E SS E /(n − 2)
6) Reject H0 if f0 > fα,1,22 where f0.01,1,10 = 10.049
7) Using the results from Exercise 10-6
280583.12 / 1
f0 = = 74334.4
37.746089 / 10
8) Since 74334.4 > 10.049, reject H 0 and conclude the model is useful α = 0.01. P-value < 0.000001
b) se( β̂1 ) = 0.0337744, se( β̂ 0 ) = 1.66765
c) 1) The parameter of interest is the regressor variable coefficient, β1.
2) H 0 :β1 = 10
3) H 1:β1 ≠ 10
4) α = 0.01
βˆ1 − β 1, 0
5) The test statistic is t0 =
se( βˆ1 )
6) Reject H0 if t0 < −tα/2,n-2 where −t0.005,10 = −3.17 or t0 > t0.005,10 = 3.17
7) Using the results from Exercise 10-6
9.21 − 10
t0 = = −23.37
0.0338
8) Since −23.37 < −3.17 reject H 0 and conclude the slope is not 10 at α = 0.01. P-value = 0.
d) H0: β0 = 0 H1: β 0 ≠ 0
− 6.3355 − 0
t0 = = − 3 .8
1.66765
P-value < 0.005; Reject H0 and conclude that the intercept should be included in the model.
11-12
c) H 0 : β1 = −0.05
H 1 : β1 < −0.05
α = 0.01
− 0.0354 − ( −0.05 )
t0 = = 0.87803
0.0166281
t .01,18 = 2.552
t 0 </ −t α ,18
Therefore, do not rejectH0. P-value = 0.804251. Insufficient evidence to conclude that β1 is ≥ -0.05.
d) H0 : β 0 = 0 H 1 : β0 ≠ 0 α = 0.01
t 0 = 12.8291
t .005,18 = 2.878
t 0 > tα / 2,18
Therefore, reject H0. P-value ≅ 0
c) H 0 : β0 = 0
H1 : β0 ≠ 0
α = 0.05
t0 = −1.67718
t.025 ,11 = 2.201
| t0 |</ −tα / 2 ,11
Therefore, do not rejectH0. P-value = 0.12166.
11-13
11-25. Refer to ANOVA of Exercise 11-9
a) H 0 : β1 = 0
H 1 : β1 ≠ 0
α = 0.05
f 0 = 53.50
f .05,1,18 = 4.414
f 0 > f α ,1,18
Therefore, reject H0. P-value = 0.000009.
b) ˆ ) = 0.0256613
se( β 1
ˆ
se( β ) = 2.13526
0
c) H 0 : β0 = 0
H1 : β0 ≠ 0
α = 0.05
t 0 = - 5.079
t.025,18 = 2.101
| t 0 |> tα / 2,18
Therefore, reject H0. P-value = 0.000078.
d) H 0 : β0 = 0
H1 : β0 ≠ 0
α = 0.01
t0 = 0.243
t.005 ,16 = 2.921
t0 >/ tα / 2 ,16
Therefore, do not rejectH0. Conclude, Yes, the intercept should be removed.
11-14
11-27. Refer to ANOVA of Exercise 11-12
a) H 0 : β1 = 0
H 1 : β1 ≠ 0
α = 0.01
f 0 = 155.2
f .01,1,18 = 8.285
f 0 > f α ,1,18
Therefore, reject H0. P-value < 0.00001.
b) ˆ ) = 45.3468
se( β 1
ˆ
se( β ) = 2.96681
0
c) H 0 : β1 = −30
H 1 : β1 ≠ −30
α = 0.01
− 36.9618 − ( −30 )
t0 = = −2.3466
2.96681
t .005 ,18 = 2.878
| t 0 |>/ −t α / 2 ,18
Therefore, do not reject H0. P-value = 0.0153(2) = 0.0306.
d) H 0 : β0 = 0
H1 : β0 ≠ 0
α = 0.01
t0 = 57.8957
t.005 ,18 = 2.878
t 0 > t α / 2,18 , therefore, reject H0. P-value < 0.00001.
e) H0 :β 0 = 2500
H1 : β0 > 2500
α = 0.01
2625.39 − 2500
t0 = = 2.7651
45.3468
t.01,18 = 2.552
t0 > tα ,18 , therefore reject H 0 . P-value = 0.0064.
11-15
b ˆ
11-28. t0 =
β1 After the transformation βˆ1∗ = β 1 , S xx∗ = a 2 S xx , x ∗ = ax , βˆ0∗ = bβˆ0 , and
σ 2 / S xx a
βˆ
11-29. a) has a t distribution with n-1 degree of freedom.
σˆ 2
xi2
b) From Exercise 11-17, βˆ = 21.031461,σˆ = 3.611768, and xi2 = 14.7073 .
The t-statistic in part a. is 22.3314 and H 0 : β 0 = 0 is rejected at usual α values.
| −0.01 − (−0.005) |
11-30. d= = 0.76 , S xx = 3608611.96 .
27
2.4 3608611.96
Assume α = 0.05, from Chart VI and interpolating between the curves for n = 20 and n = 30, β ≅ 0.05 .
11-16
11-32. tα/2,n-2 = t0.005,18 = 2.878
a) βˆ1 ± (t 0.005,18 )se( βˆ1 ) .
0.0041612 ± (2.878)(0.000484)
0.0027682 ≤ β1 ≤ 0.0055542
( )
b) βˆ 0 ± t 0.005,18 se( βˆ 0 ) .
0.3299892 ± (2.878)(0.04095)
0.212135 ≤ β 0 ≤ 0.447843
c) 99% confidence interval on µ when x 0 = 85 F .
µˆ Y | x 0 = 0 .683689
( x0 − x ) 2
µˆ Y | x ± t.005 ,18 σˆ 2 ( 1n +
0 S xx
)
( 85 − 73 . 9 ) 2
0 .683689 ± ( 2 .878 ) 0 .00796 ( 201 + 33991 .6
)
0 .683689 ± 0 .0594607
0 .6242283 ≤ µˆ Y | x 0 ≤ 0 .7431497
Note for Problems 11-33 through 11-35: These computer printouts were obtained from Statgraphics. For Minitab users, the
standard errors are obtained from the Regression subroutine.
a) −0.00961 ≤ β1 ≤ −0.00444.
b) 16.2448 ≤ β0 ≤ 27.3318.
(1800 − 2110.14 ) 2
c) 9.143 ± (2.056) 5.72585( 281 + 3608325 .5
)
9.143 ± 1.2287
7.9143 ≤ µˆ Y | x0 ≤ 10.3717
(1800 − 2110.14 ) 2
d) 9.143 ± (2.056) 5.72585(1 + 281 + 3608325.5
)
9.143 ± 5.0709
4.0721 ≤ y0 ≤ 14.2139
11-17
11-34. 95 percent confidence intervals for coefficient estimates
--------------------------------------------------------------------------------
Estimate Standard error Lower Limit Upper Limit
CONSTANT 13.3202 2.57172 7.98547 18.6549
Taxes 3.32437 0.39028 2.51479 4.13395
--------------------------------------------------------------------------------
a) 2.51479 ≤ β1 ≤ 4.13395.
b) 7.98547 ≤ β0 ≤ 18.6549.
( 7.5 − 6.40492 ) 2
c) 38.253 ± (2.074) 8.76775( 241 + 57.563139
)
38.253 ± 1.5353
36.7177 ≤ µˆ Y | x0 ≤ 39.7883
1 ( 7.5− 6.40492 ) 2
d) 38.253 ± (2.074) 8.76775(1 + 24
+ 57.563139
)
38.253 ± 6.3302
31.9228 ≤ y0 ≤ 44.5832
a) 9.10130 ≤ β1 ≤ 9.31543
b) −11.6219 ≤ β0 ≤ −1.04911
1 + (55−46.5)2
c) 500124
. ± (2.228) 3.774609( 12 3308.9994
)
500.124 ± 1.4037586
498.72024 ≤ µ Y|x 0 ≤ 50152776
.
1 + (55− 46.5)2
d) 500.124 ± (2.228) 3.774609(1 + 12 3308.9994
)
500.124 ± 4.5505644
495.57344 ≤ y0 ≤ 504.67456
It is wider because the prediction interval includes error for both the fitted model and from that associated
with the future observation.
11-18
( 150 − 149 . 3 ) 2
d) 28 . 225 ± ( 2 . 101 ) 13 . 39232 (1 + 1
20
+ 48436 . 256
)
28 . 225 ± 7 . 87863
20 . 3814 ≤ y 0 ≤ 36 . 1386
11-19
d) 1886 . 154 ± ( 2 . 101 ) 9811 . 21 (1 + ( 20 −13 . 3375 ) 2
1
20 + 1114 . 6618 )
1886 . 154 ± 217 . 25275
1668 . 9013 ≤ y 0 ≤ 2103 . 4067
Section 11-7
11-42. Use the results of Exercise 11-4 to answer the following questions.
2
a) R = 0.544684 ; The proportion of variability explained by the model.
2 148.87197 / 26
RAdj = 1− = 1 − 0.473 = 0.527
326.96429 / 27
b) Yes, normality seems to be satisfied since the data appear to fall along the straight line.
N o r m a l P r o b a b ility P lo t
99.9
99
95
80
cu m u l.
50
p ercen t
20
0.1
c) Since the residuals plots appear to be random, the plots do not include any serious model inadequacies.
Regression of Games on Yards Residuals vs. Predicted Values
6.1 6.1
4.1 4.1
Residuals
Residuals
2.1 2.1
0.1 0.1
-1.9 -1.9
-3.9 -3.9
11-20
11-43. Use the Results of exercise 11-5 to answer the following questions.
a) SalePrice Taxes Predicted Residuals
25.9 4.9176 29.6681073 -3.76810726
29.5 5.0208 30.0111824 -0.51118237
27.9 4.5429 28.4224654 -0.52246536
25.9 4.5573 28.4703363 -2.57033630
29.9 5.0597 30.1405004 -0.24050041
29.9 3.8910 26.2553078 3.64469225
30.9 5.8980 32.9273208 -2.02732082
28.9 5.6039 31.9496232 -3.04962324
35.9 5.8282 32.6952797 3.20472030
31.5 5.3003 30.9403441 0.55965587
31.0 6.2712 34.1679762 -3.16797616
30.9 5.9592 33.1307723 -2.23077234
30.0 5.0500 30.1082540 -0.10825401
36.9 8.2464 40.7342742 -3.83427422
41.9 6.6969 35.5831610 6.31683901
40.5 7.7841 39.1974174 1.30258260
43.9 9.0384 43.3671762 0.53282376
37.5 5.9894 33.2311683 4.26883165
37.9 7.5422 38.3932520 -0.49325200
44.5 8.7951 42.5583567 1.94164328
37.9 6.0831 33.5426619 4.35733807
38.9 8.3607 41.1142499 -2.21424985
36.9 8.1400 40.3805611 -3.48056112
45.8 9.1416 43.7102513 2.08974865
b) Assumption of normality does not seem to be violated since the data appear to fall along a straight line.
Normal Probability Plot
99.9
99
95
cumulative percent
80
50
20
0.1
-4 -2 0 2 4 6 8
Residuals
c) There are no serious departures from the assumption of constant variance. This is evident by the random pattern of
the residuals.
Plot of Residuals versus Predicted Plot of Residuals versus Taxes
8 8
6 6
4 4
Residuals
Residuals
2 2
0 0
-2 -2
-4 -4
d) R 2 ≡ 76.73% ;
11-44. Use the results of Exercise 11-6 to answer the following questions
11-21
a) R 2 = 99.986% ; The proportion of variability explained by the model.
b) Yes, normality seems to be satisfied since the data appear to fall along the straight line.
Normal Probability Plot
99.9
99
95
cumulative percent
80
50
20
0.1
Residuals
c) There might be lower variance at the middle settings of x. However, this data does not indicate a serious
departure from the assumptions.
Plot of Residuals versus Predicted Plot of Residuals versus Temperature
5.4 5.4
3.4 3.4
Residuals
Residuals
1.4 1.4
-0.6 -0.6
-2.6 -2.6
2
11-45. a) R = 20.1121%
b) These plots indicate presence of outliers, but no real problem with assumptions.
Residuals Versus x
(response is y) ResidualsVersusthe Fitted Values
(responseis y)
10
10
Residual
la
0 u
id
se 0
R
-10 -10
100 200 300 23 24 25 26 27 28 29 30 31
x FittedValue
11-22
c) The normality assumption appears marginal.
Normal Probability Plot of the Residuals
(response is y)
10
Residual
0
-10
-2 -1 0 1 2
Normal Score
11-46. a)
60
50
y
40
yˆ = 0.677559 + 0.0521753 x
b) H0 : β1 = 0 H 1 : β1 ≠ 0 α = 0.05
f 0 = 7.9384
f.05,1,12 = 4.75
f 0 > fα ,1,12
Reject Ho.
c) σˆ 2 = 25.23842
2
d) σˆ orig = 7.324951
The new estimate is larger because the new point added additional variance not accounted for by the
model.
e) Yes, e14 is especially large compared to the other residuals.
f) The one added point is an outlier and the normality assumption is not as valid with the point included.
Normal Probability Plot of the Residuals
(response is y)
10
Residual
-10
-2 -1 0 1 2
Normal Score
11-23
g) Constant variance assumption appears valid except for the added point.
Residuals Versus the Fitted Values
Residuals Versus x
(response is y)
(response is y)
10
10
Residual
Residual
0 0
-10 -10
850 950 1050 45 50 55
x Fitted Value
11-47. a) R 2 = 71.27%
b) No major departure from normality assumptions.
Normal Probability Plot of the Residuals
(response is y)
1
Residual
-1
-2
-2 -1 0 1 2
Normal Score
3
3
2
2
1
1
Residual
Residual
0 0
-1 -1
-2 -2
60 70 80 90 100 0 1 2 3 4 5 6 7 8
x Fitted Value
2
11-48. a) R = 0.879397
b) No departures from constant variance are noted.
Residuals Versus x Residuals Versus the Fitted Values
(response is y) (response is y)
30 30
20 20
10 10
Residual
Residual
0 0
-10 -10
-20 -20
-30 -30
0 10 20 30 40 80 130 180 230
x Fitted Value
11-24
c) Normality assumption appears reasonable.
Normal Probability Plot of the Residuals
(response is y)
30
20
10
Residual
0
-10
-20
-30
-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5
Normal Score
11-49. a) R 2 = 85 . 22 %
b) Assumptions appear reasonable, but there is a suggestion that variability increases slightly with y.
Residuals Versus x
Residuals Versus the Fitted Values
(response is y)
(response is y)
5
5
Residual
Residual
0 0
-5 -5
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 0 10 20 30 40
x Fitted Value
c) Normality assumption may be questionable. There is some “ bending” away from a straight line in the tails of the
normal probability plot.
Normal Probability Plot of the Residuals
(response is y)
5
Residual
-5
-2 -1 0 1 2
Normal Score
11-25
11-50. a) R 2 = 0 . 896081 89% of the variability is explained by the model.
b) Yes, the two points with residuals much larger in magnitude than the others.
1
Normal Score
-1
-2
-200 -100 0 100
Residual
( n − 2)(1 − SS
S yy )
E
S yy − SS E S yy − SS E
11-51. Using R 2 = 1 − SS
S yy ,
E
F0 = SS E
= SS E
=
S yy n− 2 σˆ 2
Also,
SS E = ( yi − βˆ 0 − βˆ1 xi ) 2
= ( yi − y − βˆ1 ( xi − x )) 2
( yi − y ) + βˆ1 ( xi − x ) 2 − 2 βˆ1
2
= ( yi − y )(xi − x )
( yi − y ) − βˆ1
2 2 2
= ( xi − x )
S yy − SS E = βˆ1
2 2
( xi − x )
βˆ12
Therefore, F0 =
2
= t 02
σˆ / S xx
Because the square of a t random variable with n-2 degrees of freedom is an F random variable with 1 and n-2 degrees
of freedom, the usually t-test that compares | t0 | to tα / 2, n − 2 is equivalent to comparing f 0 = t02 to
fα ,1, n − 2 = tα / 2, n − 2 .
11-26
11-52. a) f = 0 . 9 ( 23 ) = 207 . Reject H 0 : β 1 = 0 .
0
1 − 0 .9
2
b) Because f . 05 ,1 , 23 = 4 . 28 , H0 is rejected if 23 R > 4 . 28 .
2
1− R
That is, H0 is rejected if
23 R 2 > 4 .28 (1 − R 2 )
27 .28 R 2 > 4 .28
R 2 > 0 .157
Section 11-9
11-27
d) H 0 : ρ = 0
H1 : ρ ≠ 0 α = 0.05
R n−2 0 . 90334 18
t0 = = = 8 . 9345
1− R 2
1 − 0 . 816
t . 025 ,18 = 2 . 101
t 0 > t α / 2 ,18
Reject H0 .
e) H 0 : ρ = 0 . 5
H 1 : ρ ≠ 0 .5 α = 0.05
z 0 = 3 . 879
z .025 = 1 . 96
z 0 > zα / 2
Reject H0 .
f) tanh(arcta nh 0.90334 - z.025
17
) ≤ ρ ≤ tanh(arcta nh 0.90334 + z.025
17
) where z.025 = 196
. .
0 . 7677 ≤ ρ ≤ 0 . 9615 .
e) H 0 : ρ = 0 . 6
H 1 : ρ ≠ 0 .6 α = 0.05
z 0 = (arctanh 0 .77349 − arctanh 0 .6 )( 23 ) 1 / 2 = 1 .6105
z .025 = 1 .96
z 0 >/ zα / 2
Do not reject H0 .
f) tanh(arcta nh 0.77349 - z ) ≤ ρ ≤ tanh(arcta nh 0.77349 + ) where z.025 = 196
. .
.025 z .025
23 23
0 . 5513 ≤ ρ ≤ 0 . 8932 .
11-28
11-57. a) r = -0.738027
b) H 0 : ρ = 0
H1 : ρ ≠ 0 α = 0.05
−0.738027 26
t0 = 1− 0.5447
= −5.577
t.025, 26 = 2.056
| t0 |> tα / 2, 26
Reject H0 . P-value = (3.69E-6)(2) = 7.38E-6
c) tanh(arcta nh - 0.738 - z.025
) ≤ ρ ≤ tanh(arcta nh - 0.738 + z.025
)
25 25
1/ 2
11-58 S
R = βˆ1 xx and 1 − R 2 = SS E .
S yy S yy
1/ 2
βˆ1 S xx
n−2 SS E .
Therefore, R n−2 S yy βˆ1 where σˆ 2 =
T0 = = =
1− R2 SS E
1/ 2
σˆ 2 n−2
S yy S xx
11-59 n = 50 r = 0.62
a) H 0 : ρ = 0
H1 : ρ ≠ 0 α = 0.01
r n−2 0 . 62 48
t0 = = = 5 . 475
1− r 2 1 − ( 0 .62 ) 2
t .005 , 48 = 2 . 682
t 0 > t 0 .005 , 48
Reject H0 . P-value ≅ 0
b) tanh(arctanh 0.62 - z ) ≤ ρ ≤ tanh(arctanh 0.62 + z )
.005 .005
47 47
t.025,9998 = 1.96
t 0 > tα / 2,9998
Reject H0 . P-value = 2(0.02274) = 0.04548
11-29
b) Since the sample size is so large, the standard error is very small. Therefore, very small differences are
found to be "statistically" significant. However, the practical significance is minimal since r = 0.02 is
essentially zero.
11-61. a) r = 0.933203
b) H0 :ρ = 0
H1 : ρ ≠ 0 α = 0.05
r n−2 0 . 933203 15
t0 = = 1 − ( 0 . 8709 )
= 10 . 06
1− r 2
c) yˆ = 0.72538 + 0.498081x
H 0 : β1 = 0
H 1 : β1 ≠ 0 α = 0.05
f 0 = 101 . 16
f .05 ,1 ,15 = 4 . 543
f 0 >> f α ,1 ,15
Reject H0. Conclude that the model is significant at α = 0.05. This test and the one in part b are identical.
d) H0 : β0 = 0
H1 : β0 ≠ 0 α = 0.05
t 0 = 0 . 468345
t . 025 ,15 = 2 . 131
t 0 >/ t α / 2 , 15
3 3
2 2
1 1
Residual
Residual
0 0
-1 -1
-2 -2
10 20 30 40 50 5 15 25
x Fitted Value
1
Residual
-1
-2
-2 -1 0 1 2
Normal Score
11-30
11-62. n = 25 r = 0.83
a) H 0 : ρ = 0
H 1 : ρ ≠ 0 α = 0.05
r n−2 0 .83 23
t0 = = = 7 . 137
1− r 2 1− ( 0 . 83 ) 2
t.025 , 23 = 2 . 069
t 0 > tα / 2 , 23
Reject H0 . P-value = 0.
z.025 z.025
b) tanh(arctanh 0.83 - ) ≤ ρ ≤ tanh(arctanh 0.83 + )
22 22
. . 0 . 6471 ≤ ρ ≤ 0 . 9226 .
where z.025 = 196
c) H 0 : ρ = 0. 8
H 1 : ρ ≠ 0 .8 α = 0.05
z 0 = (arctanh 0.83 − arctanh 0.8)( 22 )1 / 2 = 0.4199
z.025 = 1.96
z 0 >/ zα / 2
Do not reject H 0 . P-value = (0.3373)(2) = 0.6746.
Supplemental Exercises
n n n
11-63. a) ( y i − yˆ i ) = yi − yˆ i and yi = nβˆ 0 + βˆ1 xi from normal equation
i =1 i =1 i =1
Then,
n
(nβˆ 0 + βˆ1 xi ) − yˆ i
i =1
n n
= nβˆ 0 + βˆ1 xi − ( βˆ0 + βˆ1 xi )
i =1 i =1
n n
= nβˆ 0 + βˆ1 xi − nβˆ 0 − βˆ1 xi = 0
i =1 i =1
n n n
b) ( y i − yˆ i )xi = yi xi − yˆ i xi
i =1 i =1 i =1
n n n
and yi xi = βˆ0 xi + βˆ1 xi 2 from normal equations. Then,
i =1 i =1 i =1
n n n
βˆ0 xi + βˆ1 xi −
2
(βˆ0 + βˆ1 xi ) xi =
i =1 i =1 i =1
n n n n
βˆ0 xi + βˆ1 xi 2 − βˆ0 xi − βˆ1 xi 2 = 0
i =1 i =1 i =1 i =1
11-31
n
1
c) yˆ i = y
n i =1
yˆ = ( βˆ0 +βˆ1 x)
1 n 1
yˆ i = ( βˆ 0 + βˆ1 xi )
n i=1 n
1
= (nβˆ 0 + βˆ1 xi )
n
1
= (n( y − βˆ1 x ) + βˆ1 xi )
n
1
= (ny − nβˆ1 x + βˆ1 xi )
n
= y − βˆx + βˆ x 1
=y
11-64. a)
Plot of y vs x
2.2
1.9
1.6
y
1.3
0.7
11-32
b) Model fitting results for: y
Independent variable coefficient std. error t-value sig.level
CONSTANT -0.966824 0.004845 -199.5413 0.0000
x 1.543758 0.003074 502.2588 0.0000
--------------------------------------------------------------------------------
R-SQ. (ADJ.) = 1.0000 SE= 0.002792 MAE= 0.002063 DurbWat= 2.843
Previously: 0.0000 0.000000 0.000000 0.000
10 observations fitted, forecast(s) computed for 0 missing val. of dep. var.
y = −0.966824 + 154376
. x
c) Analysis of Variance for the Full Regression
Source Sum of Squares DF Mean Square F-Ratio P-value
Model 1.96613 1 1.96613 252264. .0000
Error 0.0000623515 8 0.00000779394
--------------------------------------------------------------------------------
Total (Corr.) 1.96619 9
R-squared = 0.999968 Stnd. error of est. = 2.79176E-3
R-squared (Adj. for d.f.) = 0.999964 Durbin-Watson statistic = 2.84309
2) H 0 :β1 = 0
3) H 1:β1 ≠ 0
4) α = 0.05
SS R / k
5) The test statistic is f0 =
SS E /(n − p )
6) Reject H0 if f0 > fα,1,8 where f0.05,1,8 = 5.32
7) Using the results from the ANOVA table
1.96613 / 1
f0 = = 255263.9
0.0000623515 / 8
8) Since 2552639 > 5.32 reject H0 and conclude that the regression model is significant at α = 0.05.
P-value < 0.000001
d) 95 percent confidence intervals for coefficient estimates
--------------------------------------------------------------------------------
Estimate Standard error Lower Limit Upper Limit
CONSTANT -0.96682 0.00485 -0.97800 -0.95565
x 1.54376 0.00307 1.53667 1.55085
--------------------------------------------------------------------------------
−0.97800 ≤ β 0 ≤ −0.95565
e) 2) H 0 :β 0 = 0
3) H 1:β 0 ≠ 0
4) α = 0.05
β0
5) The test statistic is t 0 =
se(β0 )
6) Reject H0 if t0 < −tα/2,n-2 where −t0.025,8 = −2.306 or t0 > t0.025,8 = 2.306
7) Using the results from the table above
−0.96682
t0 = = −199.34
0.00485
8) Since −199.34 < −2.306 reject H 0 and conclude the intercept is significant at α = 0.05.
11-33
11-65. a) yˆ = 93.34 + 15.64 x
b) H 0 : β1 = 0
H1 : β1 ≠ 0 α = 0.05
f 0 = 12.872
f .05,1,14 = 4.60
f 0 > f 0.05,1,14
Reject H0 . Conclude that β1 ≠ 0 at α = 0.05.
c) (7.961 ≤ β1 ≤ 23.322)
d) (74.758 ≤ β 0 ≤ 111.923)
e) yˆ = 93.34 + 15.64(2.5) = 132.44
800
15 700
Vapor Pressure (mm Hg)
600
500
10 400
300
y
200
5
100
280 3 30 38 0
0 T em p erature (K )
0 500 1000 1500 2000 2500 3000 3500
x
b) y = - 1956.3 + 6.686 x
c) Source DF SS MS F P
Regression 1 491662 491662 35.57 0.000
Residual Error 9 124403 13823
Total 10 616065
11-34
d)
200
100
Residual
-100
5
Ln(VP)
1
0 .0 0 2 7 0 .0 0 3 2 0 .0 0 3 7
1/T
Analysis of Variance
Source DF SS MS F P
Regression 1 28.511 28.511 66715.47 0.000
Residual Error 9 0.004 0.000
Total 10 28.515
11-35
Residuals Versus the Fitted Values
(response is y*)
0.02
0.01
Residual 0.00
-0.01
-0.02
-0.03
1 2 3 4 5 6 7
Fitted Value
There is still curvature in the data, but now the plot is convex instead of concave.
11-67. a)
15
10
y
b) yˆ = −0.8819 + 0.00385 x
c) H 0 : β1 = 0
H1 : β1 ≠ 0 α = 0.05
f 0 = 122.03
f 0 > f 0.05,1, 48
Reject H0 . Conclude that regression model is significant at α = 0.05
11-36
d) No, it seems the variance is not constant, there is a funnel shape.
Residuals Versus the Fitted Values
(response is y)
Residual
-1
-2
-3
-4
-5
0 5 10
Fitted Value
1
11-68. yˆ ∗ = 1.2232 + 0.5075 x where y∗ = . No, model does not seem reasonable. The residual plots indicate a
y
possible outlier.
11-69. yˆ = 0.7916 x
Even though y should be zero when x is zero, because the regressor variable does not normally assume values
near zero, a model with an intercept fits this data better. Without an intercept, the MSE is larger because there are fewer
terms and the residuals plots are not satisfactory.
2
11-70. yˆ = 4.5755 + 2.2047 x , r= 0.992, R = 98.40%
2
The model appears to be an excellent fit. Significance of regressor is strong and R is large. Both regression
coefficients are significant. No, the existence of a strong correlation does not imply a cause and effect
relationship.
11-71 a)
110
100
90
80
days
70
60
50
40
30
16 17 18
index
11-37
b) The regression equation is
yˆ = −193 + 15.296 x
Analysis of Variance
Source DF SS MS F P
Regression 1 1492.6 1492.6 2.64 0.127
Residual Error 14 7926.8 566.2
Total 15 9419.4
Cannot reject Ho; therefore we conclude that the model is not significant. Therefore the seasonal
meteorological index (x) is not a reliable predictor of the number of days that the ozone level exceeds 0.20
ppm (y).
c) 95% CI on β1
βˆ1 ± tα / 2 , n − 2 se ( βˆ1 )
15 . 296 ± t .025 ,12 ( 9 . 421 )
15 . 296 ± 2 . 145 ( 9 . 421 )
− 4 . 912 ≤ β 1 ≤ 35 . 504
d) ) The normality plot of the residuals is satisfactory. However, the plot of residuals versus run order exhibits a
strong downward trend. This could indicate that there is another variable should be included in the model, one that
changes with time.
2 40
30
1 20
Normal Score
10
Residual
0 0
-10
-1 -20
-30
-2
-40
-40 -30 -20 -10 0 10 20 30 40
Residual 2 4 6 8 10 12 14 16
Observation Order
11-72 a)
0.7
0.6
0.5
y
0.4
0.3
0.2
11-38
b) yˆ = .6714 − 2964 x
c) Analysis of Variance
Source DF SS MS F P
Regression 1 0.03691 0.03691 1.64 0.248
Residual Error 6 0.13498 0.02250
Total 7 0.17189
R2 = 21.47%
d) There appears to be curvature in the data. There is a dip in the middle of the normal probability plot and the plot of
the residuals versus the fitted values shows curvature.
1.5
0.2
1.0
0.1
Normal Score
0.5
Residual
0.0 0.0
-0.5
-0.1
-1.0
-0.2
-1.5
0.4 0.5 0.6
-0.2 -0.1 0.0 0.1 0.2
Fitted Value
Residual
11-73 The correlation coefficient for the n pairs of data (xi, zi) will not be near unity. It will be near zero. The data for the
pairs (xi, zi) where z i = y i2 will not fall along the straight line y i = xi which has a slope near unity and gives a
correlation coefficient near unity. These data will fall on a line y i = xi that has a slope near zero and gives a much
smaller correlation coefficient.
11-74 a)
5
y
1
2 3 4 5 6 7
x
b) yˆ = −0.699 + 1.66 x
c) Source DF SS MS F P
Regression 1 28.044 28.044 22.75 0.001
Residual Error 8 9.860 1.233
Total 9 37.904
11-39
d) x=4.25 µ y| x = 4.257
0
1 (4.25 − 4.75) 2
4.257 ± 2.306 1.2324 +
10 20.625
4.257 ± 2.306(0.3717)
1
Normal Score
Residual
0
0
-1
-1
-2
2 3 4 5 6 7 8
-2 -1 0 1 2
Fitted Value
Residual
11-75 a)
940
y
930
920
b) yˆ = 33.3 + 0.9636 x
11-40
c)Predictor Coef SE Coef T P
Constant 66.0 194.2 0.34 0.743
Therm 0.9299 0.2090 4.45 0.002
Analysis of Variance
Source DF SS MS F P
Regression 1 584.62 584.62 19.79 0.002
Residual Error 8 236.28 29.53
Total 9 820.90
Reject the hull hypothesis and conclude that the model is significant. 77.3% of the variability is explained
by the model.
d) H 0 : β1 = 1
H 1 : β1 ≠ 1 α=.05
βˆ1 − 1 0.9299 − 1
t0 = = = −0.3354
se( βˆ1 ) 0.2090
t a / 2,n − 2 = t .025,8 = 2.306
Since t 0 > −t a / 2,n − 2 , we cannot reject Ho and we conclude that there is not enough evidence to reject the
claim that the devices produce different temperature measurements. Therefore, we assume the devices
produce equivalent measurements.
1
Normal Score
-1
-5 0 5
Residual
11-41
Residuals Versus the Fitted Values
(response is IR)
5
Residual
-5
Fitted Value
Mind-Expanding Exercises
S xY
11-76. a) β̂ 1 = , βˆ 0 = Y − βˆ 1 x
S xx
Cov(βˆ0 , βˆ1 ) = Cov(Y , βˆ1 ) − xCov(βˆ1, βˆ1 )
Cov(Y , S xY ) Cov( Yi , Yi ( xi − x )) ( xi − x )σ 2
Cov(Y , βˆ1 ) = = = = 0. Therefore,
Sxx nSxx nSxx
2
σ
Cov(βˆ1, βˆ1 ) = V (βˆ1 ) =
S xx
− xσ 2
Cov( βˆ0 , βˆ1 ) =
S xx
b) The requested result is shown in part a.
E (ei2 ) V (ei )
E ( MS E ) = =
n−2 n−2
2
σ [1 − ( n1 + ( x S− x ) )]
2 i
= xx
n−2
2
σ [n − 1 − 1]
= =σ2
n−2
11-42
b) Using the fact that SSR = MSR , we obtain
{
E ( MS R ) = E ( βˆ12 S xx ) = S xx V ( βˆ1 ) + [ E ( βˆ1 )]2 }
σ2
= S xx + β12 = σ 2 + β12 S xx
S xx
S x1Y
11-78. βˆ1 =
S x1 x1
n n
E Yi ( x1i − x1 ) ( β 0 + β1x1i + β 2 x2 i )( x1i − x1 )
E ( βˆ1 ) = i =1
= i =1
S x1 x1 S x1 x1
n
β1S x x + β 2 x2 i ( x1i − x1 )
1 1
i =1
β2Sx x
= = β1 + 1 2
S x1 x1 S x1 x1
No, β1 is no longer unbiased.
2
σ n
11-79. V ( βˆ1 ) = . To minimize V ( βˆ1 ), S xx should be maximized. Because S xx = ( xi − x ) 2 , Sxx is
S xx i =1
maximized by choosing approximately half of the observations at each end of the range of x.
From a practical perspective, this allocation assumes the linear model between Y and x holds throughout the range of x
and observing Y at only two x values prohibits verifying the linearity assumption. It is often preferable to obtain some
observations at intermediate values of x.
n
11-80. One might minimize a weighted some of squares wi ( yi − β 0 − β1 xi ) 2 in which a Yi with small variance
i =1
( w i large) receives greater weight in the sum of squares.
∂ n n
wi ( yi − β 0 − β1 xi )2 = −2 wi ( yi − β 0 − β1 xi )
β0 i =1 i =1
∂ n n
wi ( yi − β 0 − β1xi )2 = −2 wi ( yi − β 0 − β1 xi )xi
β1 i =1 i =1
βˆ0 wi + βˆ1 wi xi = wi yi
βˆ0 wi xi + βˆ1 wi xi =
2
wi xi yi
as requested.
11-43
and
( wi xi y i )( wi ) − wi y i
βˆ 1=
( wi ) ( wi xi2 − ( ) wi xi )
2
wi y i wi xi ˆ
βˆ 0 = − β1 .
wi wi
sy
11-81. yˆ = y + r (x − x)
sx
S xy ( yi − y ) 2 ( x − x )
= y+
S xx S yy ( xi − x ) 2
S xy
= y+ (x − x)
S xx
= y + βˆ1 x − βˆ1 x = βˆ0 + βˆ1 x
∂ n n
11-82. a) ( yi − β 0 − β1 xi ) 2 = −2 ( yi − β 0 − β1 xi )xi
β1 i =1 i =1
Upon setting the derivative to zero, we obtain
2
β0 xi + β1 xi = xi yi
Therefore,
xi yi − β 0 xi xi ( yi − β 0 )
βˆ1 = 2
= 2
xi xi
2
xi (Yi − β 0) xi σ 2 σ2
b) V ( βˆ1 ) = V 2
= 2
= 2
xi [ xi ]2 xi
c) βˆ1 ± tα / 2, n−1 σˆ 2
xi 2
This confidence interval is shorter because xi 2 ≥ ( xi − x) 2 . Also, the t value based on n-1 degrees of freedom is
slightly smaller than the corresponding t value based on n-2 degrees of freedom.
11-44