Statistics-17 by Keller
Statistics-17 by Keller
Statistics-17 by Keller
89
CHAPTER 17
SIMPLE LINEAR REGRESSION
AND CORRELATION
SECTIONS 1 - 2
MULTIPLE CHOICE QUESTIONS
In the following multiple-choice questions, please circle the correct answer.
1.
The regression line y = 3 + 2x has been fitted to the data points (4, 8), (2, 5), and (1, 2).
The sum of the squared residuals will be:
a. 7
b. 15
c. 8
d. 22
ANSWER: d
2.
3.
90
Chapter Seventeen
a. the relationship between x and y is positive
b. the relationship between x and y is negative
c. as x increases, so does y
d. as x decreases, so does y
ANSWER: b
4.
5.
A regression analysis between sales (in $1000) and advertising (in $100) resulted in the
following least squares line: y = 75 +6x. This implies that if advertising is $800, then
the predicted amount of sales (in dollars) is:
a. $4875
b. $123,000
c. $487,500
d. $12,300
ANSWER: b
6.
A regression analysis between sales (in $1000) and advertising (in $) resulted in the
following least squares line: y = 80,000 + 5x. This implies that an:
a. increase of $1 in advertising is expected, on average, to result in an increase of $5 in
sales
b. increase of $5 in advertising is expected, on average, to result in an increase of $5,000
in sales
c. increase of $1 in advertising is expected, on average, to result in an increase of
$80,005 in sales
d. increase of $1 in advertising is expected, on average, to result in an increase of $5,000
in sales
ANSWER: d
7.
Which of the following techniques is used to predict the value of one variable on the
basis of other variables?
a. Correlation analysis
b. Coefficient of correlation
c. Covariance
d. Regression analysis
ANSWER: d
8.
10.
In the first order linear regression model, the population parameters of the y-intercept and
the slope are estimated respectively, by:
a. b0 and b1
b. b0 and 1
c. 0 and b1
d. 0 and 1
ANSWER: a
11.
12.
13.
In the first-order linear regression model, the population parameters of the y-intercept and
the slope are, respectively,
a. b0 and b1
b. b0 and 1
c. 0 and b1
d. 0 and 1
ANSWER: d
92
Chapter Seventeen
14.
In a simple linear regression problem, the following statistics are calculated from a
sample of 10 observations: ( x x )( y y ) = 2250, s x = 10, x = 50, y = 75.
The least squares estimates of the slope and y-intercept are respectively:
a. 1.5 and 0.5
b. 2.5 and 1.5
c. 1.5 and 2.5
d. 2.5 and 5.0
ANSWER: d
15.
16.
In the least squares regression line y = 3 - 2x, the predicted value of y equals:
a. 1.0 when x = -1.0
b. 2.0 when x = 1.0
c. 2.0 when x = -1.0
d. 1.0 when x = 1.0
ANSWER: d
17.
The least squares method for determining the best fit minimizes:
a. total variation in the dependent variable
b. sum of squares for error
c. sum of squares for regression
d. All of the above
ANSWER: b
18.
What do we mean when we say that a simple linear regression model is statistically
useful?
a. All the statistics computed from the sample make sense
b. The model is an excellent predictor of y
c. The model is practically useful for predicting y
d. The model is a better predictor of y than the sample y
ANSWER: d
20.
21.
Another name for the residual term in a regression equation is random error.
ANSWER: T
22.
A simple linear regression equation is given by y 5.25 3.8 x . The point estimate of y
when x = 4 is 20.45.
ANSWER: T
23.
The vertical spread of the data points about the regression line is measured by the yintercept.
ANSWER: F
24.
The method of least squares requires that the sum of the squared deviations between
actual y values in the scatter diagram and y values predicted by the regression line be
minimized.
ANSWER: T
25.
A regression analysis between sales (in $1000) and advertising (in $) resulted in the
following least squares line: y = 60 + 5x. This implies that an increase of $1 in
advertising is expected to result in an increase of $65 in sales.
ANSWER: F
26.
27.
The residual ri is defined as the difference between the actual value yi and the estimated
value yi .
ANSWER: T
28.
The regression line y = 2 + 3x has been fitted to the data points (4,11), (2,7), and (1,5).
The sum of squares for error will be 10.0.
ANSWER: T
94
Chapter Seventeen
29.
A regression analysis between sales (in $1000) and advertising (in $100) resulted in the
following least squares line: y = 77 +8x. This implies that if advertising is $600, then
the predicted amount of sales (in dollars) is $125,000.
ANSWER: T
30.
The residuals are observations of the error variable . Consequently, the minimized sum
of squared deviations is called the sum of squares for error, denoted SSE.
ANSWER: T
31.
Statisticians have shown that sample y -intercept b0 and sample slope coefficient b1 are
unbiased estimators of the population regression parameters 0 and 1 , respectively.
ANSWER: T
32.
If cov(x, y) = 7.5075 and sx2 = 3.5, then the sample slope coefficient is 2.145.
ANSWER: T
33.
The first order linear model is sometimes called the simple linear regression model.
ANSWER: T
34.
To create a deterministic model, we start with a probabilistic model that approximates the
relationship we want to model.
ANSWER: F
35.
The residual represents the discrepancy between the observed dependent variable and its
Predicted or estimated average value.
ANSWER: T
25
40
35
39
45
37
50
33
60
30
65
27
70
25
{Car Speed and Gas Mileage Narrative} Determine the least squares regression line.
ANSWER:
50.6563 0.3531x
y
37.
{Car Speed and Gas Mileage Narrative} Estimate the gas mileage of a car traveling 70
mph.
ANSWER:
When x = 70, y = 25.9393 mpg
38.
1
25
2
22
3
21
4
19
5
14
6
15
7
12
8
10
9
6
10
2
Find the least squares regression line, and the estimated value of y when x = 3
ANSWER:
27.733-2.389x. When x = 3, y = 20.566
y
39.
3
8
2
6
5
12
4
10
5
14
Two regression models are proposed: (1) y 1.2 + 2.5x, and (2) y 5.5 + 4.0x.
Using the least squares method, which of these regression models provide the better fit
to the data? Why?
ANSWER:
SSE = 4.95 and 593.25 for models 1 and 2, respectively. Therefore, model (1) fits the data
better than model (2).
40.
96
Chapter Seventeen
x
y
2
7
4
11
6
17
8
21
10
27
13
36
5
7
7
11
6
9
7
12
8
15
6
10
4
7
3
5
{Sunshine and Skin Cancer Narrative} Determine the least squares regression line.
ANSWER:
-1.115 + 1.846x
y
{Sunshine and Skin Cancer Narrative} Draw a scatter diagram of the data and plot the
least squares regression line on it.
ANSWER:
Average Daily Sunshine Line Fit Plot
16
Skin Cancer
12
Skin Cancer
42.
0
0
10
{Sunshine and Skin Cancer Narrative} Estimate the number of skin cancer per 100,000
of population for 6 hours of sunshine.
ANSWER:
When x = 6, y = 9.961
44.
{Sunshine and Skin Cancer Narrative} What does the value of the slope of the regression
line tell you?
ANSWER:
If the amount of sunshine x increases by one hour, the amount of skin cancer y increases
by an average of 1.846 per 100,000 of population.
45.
{Sunshine and Skin Cancer Narrative} Calculate the residual corresponding to the pair (x,
y) = (8, 15).
ANSWER:
e = y - y = 15 13.653 = 1.347
Years of Experience
0
2
10
3
8
5
12
7
20
15
Sales
7
9
20
15
18
14
20
17
30
25
98
Chapter Seventeen
46.
{Sales and Experience Narrative} Draw a scatter diagram of the data to determine
whether a linear model appears to be appropriate.
ANSWER:
Scatter Diagram
35
Sales
30
25
20
15
10
5
0
0
10
15
20
25
Years of Experience
{Sales and Experience Narrative} Determine the least squares regression line.
ANSWER:
8.63 + 1.0817x
y
48.
{Sales and Experience Narrative} Interpret the value of the slope of the regression line.
ANSWER:
For each additional year of experience, monthly sales of a salesperson increase by an
average of $1,081.7.
49.
{Sales and Experience Narrative} Estimate the monthly sales for a salesperson with 16
years of experience.
ANSWER:
When x =16, y = 25.94
16
58
11
40
15
55
8
35
12
43
10
41
13
52
14
49
{Income and Education Narrative} Draw a scatter diagram of the data to determine
whether a linear model appears to be appropriate.
ANSWER:
Scatter Diagram
Income
60
50
40
30
6
10
12
14
16
18
Years of Education
{Income and Education Narrative} Determine the least squares regression line.
ANSWER:
10.6165 + 2.9098x
y
52.
{Income and Education Narrative} Interpret the value of the slope of the regression line.
ANSWER:
For each additional year of education, the income increases by an average of $2,909.80.
53.
{Income and Education Narrative} Estimate the income of an individual with 15 years of
education.
ANSWER:
When x = 15, y = 54.264 (in $1000s) or $54,264.0
100
Chapter Seventeen
Contestant
1
2
3
4
5
6
7
8
54.
Years of Education
11
15
12
16
11
16
13
14
Winnings
750
400
600
350
800
300
650
400
{Game Winnings and Education Narrative} Draw a scatter diagram of the data to
determine whether a linear model appears to be appropriate.
ANSWER:
Scatter Diagram
1000
Winnings
800
600
400
200
8
10
12
14
16
18
Years of Education
{Game Winnings and Education Narrative} Determine the least squares regression line.
ANSWER:
1735 89.1667x
y
56.
{Game Winnings and Education Narrative} Interpret the value of the slope of the
regression line.
ANSWER:
For each additional year of education a contestant has, his or her winnings on TV game
shows decreases by an average of approximately $89.20.
{Game Winnings and Education Narrative} Estimate the game winnings for a contestant
with 15 years of education.
ANSWER:
When x = 15, y = $397.50
58.
Gross Revenue
48
65
18
20
31
26
73
23
39
58
{Movie Revenues Narrative} Draw a scatter diagram of the data to determine whether a
linear model appears to be appropriate.
ANSWER: It appears that a linear model is appropriate.
Scatter Diagram
Gross Revenue
1
2
3
4
5
6
7
8
9
10
80
70
60
50
40
30
20
10
0
0
10
102
59.
Chapter Seventeen
{Movie Revenues Narrative} Determine the least squares regression line.
ANSWER:
4.225 + 8.285x
y
60.
{Movie Revenues Narrative} Interpret the value of the slope of the regression line.
ANSWER:
For each million dollar paid to the two highest paid performers, the gross revenue of the
movie increases by an average of $8.285 million.
61.
{Movie Revenues Narrative} Estimate the gross revenue of a movie if the two highest
paid performers received 6 million dollars.
ANSWER:
When x = 6, y = $53.935 million
Number of Pages
844
727
360
915
295
706
410
905
1058
865
677
912
{Cost of Books Narrative} Draw a scatter diagram of the data and plot the least squares
regression line on it.
ANSWER:
Selling Price
Selling Price
30
20
10
0
200
400
600
800
1000
1200
Number of Pages
64.
{Cost of Books Narrative} Interpret the value of the slope of the regression line.
ANSWER:
For every additional page, the price of a book increases by an average of about 4 cents.
65.
{Cost of Books Narrative} Estimate the selling price for a 650 pages book.
ANSWER:
When x = 650, y = $46.037
Precipitation
0.05
0.12
0.05
0.08
0.10
0.35
0.15
0.30
0.10
0.20
Number of Accidents
5
6
2
4
8
14
7
13
7
10
104
66.
Chapter Seventeen
{Accidents and Precipitation Narrative} Find the least squares regression line.
ANSWER:
2.3704 + 34.864x
y
67.
{Accidents and Precipitation Narrative} Estimate the number of accidents in a day with
0.25 inches of precipitation
ANSWER:
When x = 0.25, y = 11.08 11 accidents
68.
{Accidents and Precipitation Narrative} What does the slope of the least squares
regression line tell you?
ANSWER:
For each additional inch of precipitation, the number of accidents on average increases by
34.864 (about 35 accidents).
62
6
57
5
40
4
49
3
67
5
54
5
43
2
65
6
54
3
41
1
Age
Number of Concerts
44
3
48
2
55
4
60
5
59
4
63
5
69
4
40
2
38
1
52
3
DESCRIPTIVE STATISTICS
Regression Statistics
Multiple R
0.80203
R Square
0.64326
Adjusted R Square
0.62344
Standard Error
0.93965
Observations
20
Age
Mean
Standard Error
Standard Deviation
Sample Variance
Count
53
2.1849
9.7711
95.4737
20
Concerts
Mean
Standard Error
Standard Deviation
Sample Variance
Count
MS
28.65711
0.88294
F
32.45653
Significance F
2.1082E-05
t Stat
-2.53491
5.69706
P-value
0.02074
0.00002
Lower 95%
-5.50746
0.07934
3.65
0.3424
1.5313
2.3447
20
Intercept
Age
df
1
18
19
SS
28.65711
15.89289
44.55
Upper 95%
-0.5156
0.1720
{Willie Nelson Concert Narrative} Draw a scatter diagram of the data to determine
whether a linear model appears to be appropriate to describe the relationship between the
age and number of concerts attended by the respondents.
ANSWER:
Scatter Diagram
Number of Concerts
7
6
5
4
3
2
1
0
30
35
40
45
50
55
60
65
70
75
Age
A linear model appears to be appropriate to describe the relationship between the age and
number of concerts attended by the respondents.
70.
{Willie Nelson Concert Narrative} Determine the least squares regression line.
ANSWER:
-3.0115 + 0.1257x
y
{Willie Nelson Concert Narrative} Plot the least squares regression line on the scatter
diagram.
ANSWER:
Scatter Diagram with Trendline
7
Number of Concerts
71.
6
5
4
3
2
1
0
30
35
40
45
50
55
Age
60
65
70
75
106
72.
Chapter Seventeen
{Willie Nelson Concert Narrative} Interpret the value of the slope of the regression line.
ANSWER:
For every additional year of age, the number of concerts attended increases on average by
0.1257. Equivalently we may say, for every additional 20 years of age, the number of
concerts attended increases on average by about 2.50.
73.
{Willie Nelson Concert Narrative} Estimate the number of Willie Nelson concerts
attended by a 64 year old person.
ANSWER:
When x = 64, y = 5.03 (about 5 concerts)
S = 0.1314
Coef
StDev
T
9.4349
0.2867
32.91
0.095235 0.008220 11.59
R-Sq = 92.46%
Degrees
Price
Degrees
21.281667
2.026750
P
0.000
0.000
R-Sq(adj) = 91.7%
Analysis of Variance
Source
Regression
DF
1
SS
2.3162
4.613
0.457
Covariances
Regression Analysis
Predictor
Constant
Degrees
StDev
MS
2.3162
F
134.24
P
0.000
Price
0.208833
SE
1.280
0.127
11
12
0.1898
2.5060
0.0173
{Oil Quality and Price Narrative} Draw a scatter diagram of the data to determine
whether a linear model appears to be appropriate to describe the relationship between the
quality of oil and price per barrel.
ANSWER:
Price
Scatter Diagram
13.4
13.2
13
12.8
12.6
12.4
12.2
12
11.8
20
25
30
35
40
45
Degrees
A linear model appears to be appropriate to describe the relationship between the quality
of oil and price per barrel.
75.
{Oil Quality and Price Narrative} Determine the least squares regression line.
ANSWER:
9.4349 + 0.095235x
y
{Oil Quality and Price Narrative} Plot the least squares regression line on the scatter
diagram.
ANSWER:
Scatter Diagram
Price
76.
13.6
13.4
13.2
13
12.8
12.6
12.4
12.2
12
11.8
20
25
30
Degrees
35
40
45
108
77.
Chapter Seventeen
{Oil Quality and Price Narrative} Interpret the value of the slope of the regression line.
ANSWER:
For every additional API gravity degree, the price of oil per barrel increases by an
average of 9.52 cents.
SECTIONS 3 - 4
MULTIPLE CHOICE QUESTIONS
In the following multiple-choice questions, please circle the correct answer.
78.
In a simple linear regression problem, the following sum of squares are produced:
( yi y ) 2 200 , ( yi y i ) 2 50 , and ( y i y ) 2 150 . The percentage of the
variation in y that is explained by the variation in x is:
a. 25%
b. 75%
c. 33%
d. 50%
ANSWER: b
79.
In simple linear regression, most often we perform a two-tail test of the population slope
1 to determine whether there is sufficient evidence to infer that a linear relationship
exists. The null hypothesis is stated as:
a. H 0 : 1 0
b. H 0 : 1 b1
c. H 0 : 1 r
d. H 0 : 1 s
ANSWER: a
80.
Testing whether the slope of the population regression line could be zero is equivalent to
testing whether the:
a. sample coefficient of correlation could be zero
b. standard error of estimate could be zero
c. population coefficient of correlation could be zero
d. sum of squares for error could be zero
ANSWER: c
81.
2
Given that s x2 500, s y 750 , cov (x, y) = 100, and n = 6, the standard error of
estimate is:
a. 12.247
b. 24.933
c. 30.2076
d. 11.180
ANSWER: c
82.
110
Chapter Seventeen
a. r
b.
c. r 2
d. 2
ANSWER: b
83.
Given that the sum of squares for error is 60 and the sum of squares for regression is 140,
then the coefficient of determination is:
a. 0.429
b. 0.300
c. 0.700
d. 0.837
ANSWER: c
84.
A regression line using 25 observations produced SSR = 118.68 and SSE = 56.32. The
standard error of estimate was:
a. 2.1788
b. 1.5648
c. 1.5009
d. 2.2716
ANSWER: b
85.
86.
Given the least squares regression line y = -2.48 + 1.63x, and a coefficient of
determination of 0.81, the coefficient of correlation is:
a. -0.85
b. 0.85
c. -0.90
d. 0.90
ANSWER: d
87.
Which value of the coefficient of correlation r indicates a stronger correlation than 0.65?
a. 0.55
b. -0.75
c. 0.60
d. -0.45
ANSWER: b
88.
If the coefficient of determination is 0.975, then the slope of the regression line:
a. must be positive
90
The sample correlation coefficient between x and y is 0.375. It has been found out that
the p value is 0.744 when testing H o : 0 against the one-sided alternative H1 : 0 .
To test the H o : 0 against the two-sided alternative H1 : 0 at a significance level
of 0.193, the p value is
a. 0.372
b. 1.488
c. 0.256
d. 0.512
ANSWER: d
91.
92.
If the coefficient of correlation is 0.80 then, the percentage of the variation in y that is
explained by the variation in x is:
a. 80%
b. 64%
c. 80%
d. 64%
ANSWER: b
93.
If all the points in a scatter diagram lie on the least squares regression line, then the
coefficient of correlation must be:
a. 1.0
b. 1.0
c. either 1.0 or 1.0
d. 0.0
ANSWER: c
If the coefficient of correlation is 0.60, then the coefficient of determination is:
a. -0.60
94.
112
Chapter Seventeen
b. -0.36
c. 0.36
d. 0.40
ANSWER: c
95.
96.
If the coefficient of correlation between x and y is close to 1.0, this indicates that:
a. y causes x to happen
b. x causes y to happen
c. both (a) and (b)
d. there may or may not be any causal relationship between x and y
ANSWER: d
97.
For the values of the coefficient of determination listed below, which one implies the
greatest value of the sum of squares for regression given that the total variation in y is
1800?
a. 0.69
b. 0.96
c. 0.58
d. 0.85
ANSWER: b
98.
When all the actual and predicted values of y are equal, the standard error of estimate will
be:
a. 1.0
b. 1.0
c. 0.0
d. 2.0
ANSWER: c
99.
Which of the following statistics and procedures can be used to determine whether a
linear model should be employed?
a. The standard error of estimate
b. The coefficient of determination
c. The t-test of the slope
d. All of the above
ANSWER: d
101.
102.
If the standard error of estimate s = 20 and n = 10, then the sum of squares for error,
SSE, is:
a. 400
b. 3200
c. 4000
d. 40000
ANSWER: b
103.
The smallest value that the standard error of estimate s can assume is:
a. 1
b. 0
c. 1
d. 2
ANSWER: b
104.
2
2
If cov(x, y) = 1260, s x 1600 and s y 1225, then the coefficient of determination is:
a. 0.7875
b. 1.0286
c. 0.8100
d. 0.7656
ANSWER: c
105.
114
Chapter Seventeen
b. variation of x around the regression line
c. variation of y around the mean y
d. variation of x around the mean x
ANSWER: a
106.
107.
108.
109.
In a regression problem the following pairs of (x,y) are given: (3,1), (3,-1), (3,0), (3,-2)
and (3,2). That indicates that the:
a. correlation coefficient is 1
b. correlation coefficient is 0
c. correlation coefficient is 1
d. coefficient of determination is between 1 and 1
ANSWER: b
110.
111.
The sample correlation coefficient between x and y is 0.375. It has been found out that
the p value is 0.256 when testing H o : 0 against the two-sided alternative
113.
114.
In a regression problem, if all the values of the independent variable are equal, then the
coefficient of determination must be:
a. 1.0
b. 0.5
c. 0.0
d. 1.0
ANSWER: c
115.
116.
In simple linear regression, the coefficient of correlation r and the least squares estimate
b1 of the population slope 1 :
a. must be equal
116
Chapter Seventeen
b. must have opposite signs
c. must have the same sign
d. may have opposite signs or the same sign
ANSWER: c
117.
118.
119.
120.
121.
122.
The sample correlation coefficient between x and y is 0.375. It has been found out that
the p value is 0.256 when testing H o : 0 against a two-sided alternative H1 : 0 .
To test H o : 0 against the one-sided alternative H1 : 0 at a significance level of
0.193, the p - value will be equal to
Which of the following in not a required condition for the error variable in the simple
linear regression model?
a. The probability distribution of is normal.
b. The mean of the probability distribution of is zero.
c. The standard deviation of is a constant no matter what the value of x.
d. The values of are auto correlated.
ANSWER: d
124.
125.
126.
If the coefficient of correlation is 0.90, then the percentage of the variation in the
dependent variable y that is explained by the variation in the independent variable x is:
a. 90%
b. 81%
c. 0.90%
d. 0.81%
ANSWER: b
127.
If a researcher wanted to find out if alcohol consumptions and grade point average on a 4
point scale are linearly related, he would perform a
a. 2 test for the difference in two proportions
b. 2 test for independence
c. a z test for the difference in two proportions
118
Chapter Seventeen
d. a t test for no linear relationship between the two variables
ANSWER: d
If the value of the sum of squares for error SSE equals zero, then the coefficient of
determination must equal zero.
ANSWER: F
129.
When the actual values y of a dependent variable and the corresponding predicted values
y are the same, the standard error of the estimate will be 1.0.
ANSWER: F
130.
The value of the sum of squares for regression SSR can never be smaller than 0.0.
ANSWER: T
131.
The value of the sum of squares for regression SSR can never be smaller than 1.
ANSWER: F
132.
If all the values of an independent variable x are equal, then regressing a dependent
variable y on x will result in a coefficient of determination of zero.
ANSWER: T
133.
In a simple linear regression model, testing whether the slope 1 of the population
regression line could be zero is the same as testing whether or not the population
coefficient of correlation equals zero.
ANSWER: T
134.
When the actual values y of a dependent variable and the corresponding predicted values
y are the same, the standard error of estimate s will be 0.0.
ANSWER: T
135.
136.
The value of the sum of squares for regression SSR can never be larger than the value of
sum of squares for error SSE.
ANSWER: F
137.
When the actual values y of a dependent variable and the corresponding predicted values
y are the same, the standard error of estimate s will be -1.0.
ANSWER: F
138.
In a simple linear regression problem, the least squares line is y = -3.75 + 1.25 x , and
the coefficient of determination is 0.81. The coefficient of correlation must be 0.90.
ANSWER: F
120
Chapter Seventeen
139.
140.
In a regression problem the following pairs of (x, y) are given: (4,-2), (4,-1), (4,0), (4,1)
and (4,2). That indicates that the coefficient of correlation is 1.
ANSWER: F
141.
The value of the sum of squares for regression SSR can never be larger than the value of
total sum of squares SST.
ANSWER: T
142.
143.
144.
If the coefficient of correlation is 0.81, then the percentage of the variation in y that is
explained by the regression line is 81%.
ANSWER: F
145.
If all the points in a scatter diagram lie on the least squares regression line, then the
coefficient of correlation must be 1.0.
ANSWER: F
146.
If the standard error of estimate s = 20 and n = 8, then the sum of squares for error SSE
is 2,400.
ANSWER: T
147.
The probability distribution of the error variable is normal, with mean E( ) = 0, and
standard deviation =1.
ANSWER: F
148.
149.
2
2
Given that cov(x, y) = 10, s y = 15, sx = 8, and n = 12, the value of the standard error of
estimate s is 2.75.
ANSWER: F
If the error variable is normally distributed, the test statistic for testing H 0 : 1 0 is
Student t distributed with n 2 degrees of freedom.
ANSWER: T
151.
2
2
Given that cov(x, y) = 8.5, s y = 8, and sx = 10, then the value of the coefficient of
determination is 0.95.
ANSWER: F
152.
153.
Given that SSE = 60 and SSR = 540, the proportion of the variation in y that is explained
by the variation in x is 0.90.
ANSWER: T
154.
Given that SSE = 84 and SSR = 358.12, the coefficient of correlation (also called the
Pearson coefficient of correlation) must be 0.90.
ANSWER: F
155.
Except for the values r = -1, 0, and 1, we cannot be specific in our interpretation of the
coefficient of correlation r. However, when we square it we produce a more meaningful
statistic.
ANSWER: T
156.
A zero population correlation coefficient between a pair of random variables means that
there is no linear relationship between the random variables.
ANSWER: T
157.
2
2
Given that cov(x, y) = 8, s y = 14, sx = 10, and n = 6, the value of the sum of squares for
error SSE is 38.
ANSWER: T
158.
122
Chapter Seventeen
STATISTICAL CONCEPTS & APPLIED QUESTIONS
25
40
35
39
45
37
50
33
60
30
65
27
70
25
{Car Speed and Gas Mileage Narrative} Calculate the standard error of estimate, and
describe what this statistic tells you about the regression line.
ANSWER:
s 1.448; the models fit to these data is good.
160.
{Car Speed and Gas Mileage Narrative} Do these data provide sufficient evidence at the
5% significance level to infer that a linear relationship exists between higher speeds and
lower gas mileage?
ANSWER:
H 0 : 0 vs. H 1 : 0
Rejection region: | t | > t0.025,10 2.228
Test statistic: t = -9.754
Conclusion: Reject the null hypothesis. Yes, these data provide sufficient evidence at the
5% significance level to infer that a linear relationship exists between higher speeds and
lower gas mileage.
161.
{Car Speed and Gas Mileage Narrative} Predict with 99% confidence the gas mileage of
a car traveling 55 mph.
ANSWER:
31.236 6.284. Thus, LCL = 24.952, and UCL = 37.52
162.
{Car Speed and Gas Mileage Narrative} Calculate the Pearson coefficient of correlation.
ANSWER:
r = -0.975
163.
{Car Speed and Gas Mileage Narrative} What does the coefficient of correlation tell you
about the direction and strength of the relationship between the two variables?
ANSWER:
There is a very strong negative linear relationship between car speed and gas mileage.
{Car Speed and Gas Mileage} Calculate the coefficient of determination and interpret its
value.
ANSWER:
R 2 = 0.95. This means that 95% of the total variation in gas mileage can be explained by
the speed of the car.
165.
1
25
2
22
3
21
4
19
5
14
6
15
7
12
8
10
9
6
10
2
2
7
4
11
6
17
8
21
10
27
13
36
a. Calculate the coefficient of determination, and describe what this statistic tells you
about the relationship between the two variables.
b. Calculate the Pearson coefficient of correlation. What sign does it have? Why?
c. What does the coefficient of correlation calculated Tell you about the direction and
strength of the relationship between the two variables?
ANSWER:
a. R 2 0.995. This means that 99.5% of the variation in the dependent variable y is
explained by the variation in the independent variable x.
b. r = 0.9975. It is positive since the slope of the regression line is positive.
c. There is a very strong (almost perfect) positive linear relationship between the two
variables.
124
Chapter Seventeen
5
7
7
11
6
9
7
12
8
15
6
10
4
7
3
5
{Sunshine and Skin Cancer Narrative} Calculate the standard error of estimate, and
describe what this statistic tells you about the regression line.
ANSWER:
s 0.9608; the models fit to these data is good.
168.
{Sunshine and Skin Cancer Narrative} Can we conclude at the 1% significance level that
there is a linear relationship between sunshine and skin cancer?
ANSWER:
H 0 : 0 vs. H 1 : 0
Rejection region: | t | > t0.005,6 3.707
Test statistic: t = 8.485
Conclusion: Reject the null hypothesis. Yes, we conclude at the 1% significance level that
there is a linear relationship between sunshine and skin cancer.
169.
{Sunshine and Skin Cancer Narrative} Calculate the coefficient of determination and
interpret it.
ANSWER:
R 2 0.9231. This means that 92.31% of the variation in the incidence of skin cancer is
explained by the variation in the amount of sunshine.
170.
{Sunshine and Skin Cancer Narrative} Calculate the Pearson coefficient. What sign does
it have? Why?
ANSWER:
R = 0.9608. It is positive since the slope of the regression line ( b1 = 1.846) is positive.
171.
{Sunshine and Skin Cancer Narrative} What does the coefficient of correlation calculated
Tell you about the direction and strength of the relationship between the two variables?
ANSWER:
There is a very strong (almost perfect) positive linear relationship between the two
variables.
FOR QUESTIONS 172 THROUGH 177, USE THE FOLLOWING NARRATIVE:
Years of Experience
0
2
10
3
8
5
12
7
20
15
Sales
7
9
20
15
18
14
20
17
30
25
{Sales and Experience Narrative} Determine the standard error of estimate and describe
what this statistic tells you about the regression line.
ANSWER:
s 1.5724; the models fit is good.
173.
(Sales and Experience Narrative} Determine the coefficient of determination and discuss
what its value tells you about the two variables.
ANSWER:
R 2 0.9536, which means that 95.36% of the variation in sales is explained by the
variation in years of experience of the salesperson.
174.
{Sales and Experience Narrative} Calculate the Pearson correlation coefficient. What
sign does it have? Why?
ANSWER:
r 0.9765. It has a positive sign since the slope of the regression line ( b1 = 1.0817) is
positive.
126
175.
Chapter Seventeen
{Sales and Experience Narrative} Conduct a test of the population coefficient of
correlation to determine at the 5% significance level whether a linear relationship exists
between years of experience and sales.
ANSWER:
H 0 : 0 vs. H 1 : 0
Rejection region: | t | > t0.025,8 2.306
Test statistic: t = 12.8258
Conclusion: Reject the null hypothesis. Yes, a linear relationship exists between years of
experience and sales.
176.
{Sales and Experience Narrative} Conduct a test of the population slope to determine at
the 5% significance level whether a linear relationship exists between years of experience
and sales.
ANSWER:
H 0 : 1 0 vs. H 1 : 1 0
{Sales and Experience Narrative} Do the tests of and 1 in the previous two questions
provide the same results? Explain.
ANSWER:
Yes; both tests have the same value of the test statistic, the same rejection region, and of
course the same conclusion. This is not a coincidence; the two tests are identical.
16
58
11
40
15
55
8
35
12
43
10
41
13
52
14
49
{Income and Education Narrative} Determine the standard error of estimate and describe
what this statistic tells you about the regression line.
ANSWER:
s 2.436; the models fit to these data is good.
180.
{Income and Education Narrative} Calculate the Pearson correlation coefficient. What
sign does it have? Why?
ANSWER:
r 0.9604. It has a positive sign since the slope of the regression line ( b1 = 2.9098) is
positive.
181.
182.
{Income and Education Narrative} Conduct a test of the population slope to determine at
the 5% significance level whether a linear relationship exists between years of education
and income.
ANSWER:
H 0 : 1 0 , H 1 : 1 0
Rejection region: | t | > t0.025,6 2.447
Test statistic: t = 8.439
Conclusion: Reject the null hypothesis. Yes, a linear relationship exists between years of
education and income.
183.
{Income and Education Narrative} Do the tests of and 1 in the previous two provide
the same results? Explain.
ANSWER:
Yes; both tests have the same value of the test statistic, the same rejection region, and of
course the same conclusion. This is not a coincidence; the two tests are identical.
128
Chapter Seventeen
Years of Education
11
15
12
16
11
16
13
14
Winnings
750
400
600
350
800
300
650
400
{Game Winnings and Education Narrative} Determine the standard error of estimate and
describe what this statistic tells you about the regression line.
ANSWER:
s 59.395; the models fit to these data is good.
185.
186.
{Game Winnings and Education Narrative} Calculate the Pearson correlation coefficient.
What sign does it have? Why?
ANSWER:
r -0.9584. It has a negative sign since the slope of the regression line ( b1 = -89.1667)
is negative.
187.
{Game Winnings and Education Narrative} Conduct a test of the population coefficient
of correlation to determine at the 5% significance level whether a linear relationship
exists between years of education and TV game shows winnings.
ANSWER:
H 0 : 0 vs. H 1 : 0
Rejection region: | t | > t0.025,6 2.447
Test statistic: t = -8.2227
Conclusion: Reject the null hypothesis. Yes, a linear relationship exists between years of
{Game Winnings and Education Narrative} Conduct a test of the population slope to
determine at the 5% significance level whether a linear relationship exists between years
of education and TV game shows winnings.
ANSWER:
H 0 : 1 0 vs. H 1 : 1 0
{Game Winnings and Education Narrative} Do the tests and 1 in the previous two
questions provide the same results? Explain.
ANSWER:
Yes. This is not a coincidence; the two tests are identical.
190.
Gross Revenue
48
65
18
20
31
26
73
23
39
58
{Movie Revenues Narrative} Determine the standard error of estimate and describe what
this statistic tells you about the regression line.
130
Chapter Seventeen
ANSWER:
s 2.0247; the models fit to these is good.
191.
192.
{Movie Revenues Narrative} Calculate the Pearson correlation coefficient. What sign
does it have? Why?
ANSWER:
r 0.9954. It has a positive sign since the slope of the regression line ( b1 = 8.285) is
positive.
193.
194.
{Movie Revenues Narrative} Conduct a test of the population slope to determine at the
5% significance level whether a linear relationship exists between payment to the two
highest-paid performers and gross revenue.
ANSWER:
H 0 : 1 0 vs. H 1 : 1 0
{Movie Revenues Narrative} Do the and 1 tests in the previous questions provide the
same results? Explain.
ANSWER:
Yes; both tests have the same value of the test statistic, the same rejection region, and of
Number of Pages
844
727
360
915
295
706
410
905
1058
865
677
912
{Cost of Books Narrative} Determine the coefficient of determination and discuss what
its value tells you.
ANSWER:
R 2 0.9378, which means that 93.78% of the variation in the price of books is explained
by the variation in the number of pages.
197.
{Cost of Books Narrative} Can we infer at the 5% significance level that the editor is
correct?
ANSWER:
H 0 : 1 0 vs. H 1 : 1 0
132
Day
1
2
3
4
5
6
7
8
9
10
198.
Chapter Seventeen
Precipitation
0.05
0.12
0.05
0.08
0.10
0.35
0.15
0.30
0.10
0.20
Number of Accidents
5
6
2
4
8
14
7
13
7
10
199.
200.
H 0 : 1 0 vs. H 1 : 1 0
201.
62
6
57
5
40
4
49
3
67
5
54
5
43
2
65
6
54
3
41
1
Age
Number of Concerts
44
3
48
2
55
4
60
5
59
4
63
5
69
4
40
2
38
1
52
3
DESCRIPTIVE STATISTICS
Regression Statistics
Multiple R
0.80203
R Square
0.64326
Adjusted R Square
0.62344
Standard Error
0.93965
Observations
20
Age
Mean
Standard Error
Standard Deviation
Sample Variance
Count
53
2.1849
9.7711
95.4737
20
Concerts
Mean
Standard Error
Standard Deviation
Sample Variance
Count
MS
28.65711
0.88294
F
32.45653
Significance F
2.1082E-05
t Stat
-2.53491
5.69706
P-value
0.02074
0.00002
3.65
0.3424
1.5313
2.3447
20
Intercept
Age
df
1
18
19
SS
28.65711
15.89289
44.55
203.
Upper 95%
-0.5156
0.1720
Narrative} Determine the standard error of estimate and describe what this statistic tells
you about the models fit.
134
Chapter Seventeen
ANSWER:
s 0.9396, and since the sample mean y = 3.65, we would have to admit that the
standard error of estimate is not very small. On the other hand, it is not a large number
either. Because there is no predefined upper limit on s , it is difficult in this problem to
assess the model in this way. However, using other criteria, it seems that the models fit to
these data is reasonable.
204.
205.
{Willie Nelson Concert Narrative} Calculate the Pearson correlation coefficient. What
sign does it have? Why?
ANSWER:
r 0.80204. It has a positive sign since the slope of the regression line, b1 , is positive.
206.
207.
{Willie Nelson Concert Narrative} Conduct a test of the population slope to determine at
the 5% significance level whether a linear relationship exists between age and number of
concerts attended.
ANSWER:
H 0 : 1 0 vs. H 1 : 1 0
{Willie Nelson Concert Narrative} Do the and 1 tests in the previous two questions
provide the same results? Explain.
Regression Analysis
Predictor
Coef
StDev
T
Constant
9.4349
0.2867
32.91
Degrees
0.095235 0.008220 11.59
S = 0.1314
R-Sq = 92.46%
Analysis of Variance
Source
Regression
Residual Error
Total
209.
DF
1
11
12
StDev
SE
4.613
0.457
1.280
0.127
Covariances
Degrees
Price
Degrees
21.281667
2.026750
Price
0.208833
P
0.000
0.000
R-Sq(adj) = 91.7%
SS
2.3162
0.1898
2.5060
MS
2.3162
0.0173
F
134.24
P
0.000
{Oil Quality and Price Narrative} Determine the standard error of estimate and describe
what this statistic tells you.
ANSWER:
s 0.1314. Since the sample mean y = 12.73, the standard error of estimate is judged to
be small, and we may say that the model fits the data well.
136
210.
Chapter Seventeen
{Oil Quality and Price Narrative} Determine the coefficient of determination and discuss
what its value tells you about the two variables.
ANSWER:
R 2 0.9246, which means that 92.46% of the variation in the oil price per barrel is
explained by the variation in the API degrees.
211.
{Oil Quality and Price Narrative} Calculate the Pearson correlation coefficient. What
sign does it have? Why?
ANSWER:
r 0.9616. It has a positive sign since the slope of the regression line, b1 , is positive.
212.
{Oil Quality and Price Narrative} Conduct a test of the population coefficient of
correlation to determine at the 5% significance level whether a linear relationship exists
between the quality of oil and price per barrel.
ANSWER:
H 0 : 0 vs. H 1 : 0
Rejection region: | t | > t0.025,11 2.201
Test statistic: t r (n 2) /(1 r 2 ) = 11.61
Conclusion: Reject the null hypothesis. Yes, we can infer that at the 5% significance level
that a linear relationship exists between the quality of oil and price per barrel.
213.
{Oil Quality and Price Narrative} Conduct a test of the population slope to determine at
the 5% significance level whether a linear relationship exists between the quality of oil
and price per barrel.
ANSWER:
H 0 : 1 0 vs. H 1 : 1 0
{Oil Quality and Price Narrative} Do the and 1 tests in the previous two questions
provide the same results? Explain.
ANSWER:
138
Chapter Seventeen
SECTION 6
MULTIPLE CHOICE QUESTIONS
In the following multiple-choice questions, please circle the correct answer.
215.
In order to estimate with 95% confidence the expected value of y for a given value of x
in a simple linear regression problem, a random sample of 10 observations is taken.
Which of the following t-table values listed below would be used?
a. 2.228
b. 2.306
c. 1.860
d. 1.812
ANSWER: b
216.
Given a specific value of x and confidence level, which of the following statements is
correct?
a. The confidence interval estimate of the expected value of y can be calculated but the
prediction interval of y for the given value of x cannot be calculated.
b. The confidence interval estimate of the expected value of y will be wider than the
prediction interval.
c. The prediction interval of y for the given value of x can be calculated but the
confidence interval estimate of the expected value of y cannot be calculated.
d. The confidence interval estimate of the expected value of y will be narrower than the
prediction interval.
ANSWER: d
217.
In order to predict with 90% confidence the expected value of y for a given value of x in a
simple linear regression problem, a random sample of 10 observations is taken. Which of
the following t-table values listed below would be used?
a. 2.228
b. 2.306
c. 1.860
d. 1.812
ANSWER: c
218.
The confidence interval estimate of the expected value of y for a given value y x,
compared to the prediction interval of y for the same given value of x and confidence
level, will be
a. wider
b. narrower
c. the same
d. impossible to know
ANSWER: b
In order to predict with 99% confidence the expected value of y for a given value of x in a
simple linear regression problem, a random sample of 10 observations is taken. Which of
the following t-table values listed below would be used?
a. 1.860
b. 2.306
c. 2.896
d. 3.355
ANSWER: d
220.
The width of the confidence interval estimate for the predicted value of y depends on
a. the standard error of the estimate
b. the value of x for which the prediction is being made
c. the sample size
d. All of the above
ANSWER: d
221.
In order to predict with 80% confidence the expected value of y for a given value of x in a
simple linear regression problem, a random sample of 15 observations is taken. Which of
the following t-table values listed below would be used?
a. 1.350
b. 1.771
c. 2.160
d. 2.650
ANSWER: a
222.
In order to predict with 98% confidence the expected value of y for a given value of x in a
simple linear regression problem, a random sample of 15 observations is taken. Which of
the following t-table values listed below would be used?
a. 1.350
b. 1.771
c. 2.160
d. 2.650
ANSWER: d
140
Chapter Seventeen
TRUE / FALSE QUESTIONS
223.
In developing a 95% confidence interval for the expected value of y from a simple linear
regression problem involving a sample of size 10, the appropriate table value would be
1.86.
ANSWER: F
224.
In developing a 80% prediction interval for the particular value of y from a simple linear
regression problem involving a sample of size 12, the appropriate table value would be
1.372
ANSWER: T
225.
In developing 90% prediction interval for the particular value of y from a simple linear
regression problem involving a sample of size 14, the appropriate table value would be
2.179
ANSWER: F
226.
In order to predict with 95% confidence a particular value of y for a given value of x in
a simple linear regression problem, a random sample of 20 observations is taken. The
appropriate table value that would be used is 2.101.
ANSWER: T
227.
The confidence interval estimate of the expected value of y will be narrower than the
prediction interval for the same given value of x and confidence level. This is because
there is less error in estimating a mean value as opposed to predicting an individual value.
ANSWER: T
228.
The confidence interval estimate of the expected value of y will be wider than the
prediction interval for the same given value of x and confidence level. This is because
there is more error in estimating a mean value as opposed to predicting an individual
value.
ANSWER: F
229.
In developing a 90% confidence interval for the expected value of y from a simple linear
regression problem involving a sample of size 15, the appropriate table value would be
1.761.
ANSWER: F
230.
In developing a 99% confidence interval for the expected value of y from a simple linear
regression problem involving a sample of size 25, the appropriate table value would be
2.807
ANSWER: T
231.
The prediction interval for a particular value of y is always wider than the confidence
interval for mean value of y, given the same data set, x value, and confidence level.
ANSWER: T
A medical statistician wanted to examine the relationship between the amount of sunshine
(x) and incidence of skin cancer (y). As an experiment he found the number of skin
cancers detected per 100,000 of population and the average daily sunshine in eight
counties around the country. These data are shown below.
Average Daily Sunshine
Skin Cancer per 100,000
5
7
7
11
6
9
7
12
8
15
6
10
4
7
3
5
Predict with 95% confidence the skin cancers per 100,000 in a county with a daily
average of 6.5 hours of sunshine.
ANSWER:
10.884 2.525. Thus, LCL= 8.359, and UCL = 13.409
FOR QUESTIONS 233 THROUGH 235, USE THE FOLLOWING NARRATIVE:
Narrative: Sales and Experience
The general manager of a chain of furniture stores believes that experience is the most important
factor in determining the level of success of a salesperson. To examine this belief she records last
months sales (in $1,000s) and the years of experience of 10 randomly selected salespeople.
These data are listed below.
Salesperson
1
2
3
4
5
6
7
8
9
10
233.
Years of Experience
0
2
10
3
8
5
12
7
20
15
Sales
7
9
20
15
18
14
20
17
30
25
{Sales and Experience Narrative} Predict with 95% confidence the monthly sales of a
salesperson with 10 years of experience.
ANSWER:
19.447 3.819. Thus LCL = 15.628 (in $1000s), and UCL = 23.266 (in $1000s)
234.
{Sales and Experience Narrative} Estimate with 95% confidence the average monthly
sales of all salespersons with 10 years of experience.
ANSWER:
19.447 1.199. Thus LCL = 18.248 (in $1000s), and UCL = 20.646 (in $1000s)
142
235.
Chapter Seventeen
{Sales and Experience Narrative} Which interval in the previous two questions is
narrower: the confidence interval estimate of the expected value of y or the prediction
interval for the same given value of x (10 years) and same confidence level? Why?
ANSWER:
The confidence interval estimate of the expected value of y is narrower than the
prediction interval for the same given value of x (10 years) and some confidence level.
This is because there is less error in estimating a mean value as opposed to predicting an
individual value.
16
58
11
40
15
55
8
35
12
43
10
41
13
52
14
49
{Income and Education Narrative} Predict with 95% confidence the income of an
individual with 10 years of education.
ANSWER:
39.715 2.710. Thus, LCL = 37.005 (in $1000s), and UCL = 42.425 (in $1000s)
237.
{Income and Education Narrative} Estimate with 95% confidence the average income of
all individuals with 10 years of education.
ANSWER:
39.715 1.188. Thus, LCL = 38.527 (in $1000s), and UCL = 40.903 (in $1000s)
238.
{Income and Education Narrative} Which interval in the previous two questions is
narrower: the confidence interval estimate of the expected value of y or the prediction
interval for the same given value of x (10 years) and same confidence level? Why?
ANSWER:
The confidence interval estimate of the expected value of y is narrower than the
prediction interval for the same given value of x (10 years) and some confidence level.
This is because there is less error in estimating a mean value as opposed to predicting an
individual value.
Years of Education
11
15
12
16
11
16
13
14
Winnings
750
400
600
350
800
300
650
400
{Movie Revenues Narrative} Predict with 95% the winnings of a contestant who has 15
years of education.
ANSWER:
397.500 159.213. Thus, LCL = $238.287, and UCL = $556.713
240.
{Movie Revenues Narrative} Predict with 95% the winnings of a contestant who has 10
years of education.
ANSWER:
397.500 179.971. Thus, LCL = $217.529, and UCL = $577.471
241.
{Movie Revenues Narrative} Estimate with 95% confidence the average winnings of all
contestants who have 15 years of education.
ANSWER:
397.500 64.998. Thus, LCL = $332.502, and UCL = $462.498
242.
{Movie Revenues Narrative} Estimate with 95% confidence the average winnings of all
contestants who have 10 years of education.
ANSWER:
397.500 106.141. Thus, LCL = $291.359, and UCL = $503.641
144
Chapter Seventeen
Gross Revenue
48
65
18
20
31
26
73
23
39
58
{Movie Revenues Narrative} Predict with 95% confidence the gross revenue of a movie
whose top two stars earn $5.0 million.
ANSWER:
45.65 4.916. Thus, LCL = 40.734 (in $1,000,000s), and UCL = 50.566 (in
$1,000,000s)
244.
{Movie Revenues Narrative} Estimate with 95% confidence the average gross revenue of
a movie whose top two stars earn $5.0 million.
ANSWER:
45.65 1.54. Thus, LCL= 44.11 (in $1,000,000s), and UCL = 47.19 (in $1,000,000s)
245.
{Movie Revenues Narrative} Which interval in the previous two questions is narrower:
the confidence interval estimate of the expected value of y or the prediction interval for
the same given value of x (10 years) and same confidence level? Why?
ANSWER:
The confidence interval estimate of the expected value of y is narrower than the
prediction interval for the same given value of x (10 years) and some confidence level.
This is because there is less error in estimating a mean value as opposed to predicting an
individual value.
Number of Pages
844
727
360
915
295
706
410
905
1058
865
677
912
{Cost of Books Narrative} Predict with 90% confidence the selling price of a book with
900 pages.
ANSWER:
56.647 5.311. Thus, LCL = $51.336, and UCL = $61.958
247.
{Cost of Books Narrative} Estimate with 90% confidence the average selling price of all
books with 900 pages.
ANSWER:
56.647 1.803. Thus, LCL = $54.844, and UCL = $58.450
248.
{Cost of Books Narrative} Which interval in the previous two questions is narrower: the
confidence interval estimate of the expected value of y or the prediction interval for the
same given value of x (10 years) and same confidence level? Why?
ANSWER:
The confidence interval estimate of the expected value of y is narrower than the
prediction interval for the same given value of x (10 years) and some confidence level.
This is because there is less error in estimating a mean value as opposed to predicting an
individual value.
146
Chapter Seventeen
A statistician investigating the relationship between the amount of precipitation (in inches) and
the number of automobile accidents gathered data for 10 randomly selected days. The results
Day
1
2
3
4
5
6
7
8
9
10
249.
Precipitation
0.05
0.12
0.05
0.08
0.10
0.35
0.15
0.30
0.10
0.20
Number of Accidents
5
6
2
4
8
14
7
13
7
10
{Automobile Accidents and Precipitation Narrative} Predict with 95% confidence the
number of accidents that occur when there is 0.40 inches of rain.
ANSWER:
16.316 4.032. Thus, LCL = 12.284, and UCL = 20.348
250.
{Automobile Accidents and Precipitation Narrative} Estimate with 95% confidence the
average daily number of accidents when the daily precipitation is 0.25 inches.
ANSWER:
11.086 1.377. Thus, LCL = 9.709, and UCL = 12.463
251.
{Automobile Accidents and Precipitation Narrative} Which interval in the previous two
questions is narrower: the confidence interval estimate of the expected value of y or the
prediction interval for the same given value of x (10 years) and same confidence level?
Why?
ANSWER:
The confidence interval estimate of the expected value of y is narrower than the
prediction interval for the same given value of x (10 years) and some confidence level.
This is because there is less error in estimating a mean value as opposed to predicting an
individual value.
62
6
57
5
40
4
49
3
67
5
54
5
43
2
65
6
54
3
41
1
Age
Number of Concerts
44
3
48
2
55
4
60
5
59
4
63
5
69
4
40
2
38
1
52
3
DESCRIPTIVE STATISTICS
Regression Statistics
Multiple R
0.80203
R Square
0.64326
Adjusted R Square
0.62344
Standard Error
0.93965
Observations
20
Age
Mean
Standard Error
Standard Deviation
Sample Variance
Count
53
2.1849
9.7711
95.4737
20
Concerts
Mean
Standard Error
Standard Deviation
Sample Variance
Count
MS
28.65711
0.88294
F
32.45653
Significance F
2.1082E-05
t Stat
-2.53491
5.69706
P-value
0.02074
0.00002
Lower 95%
-5.50746
0.07934
3.65
0.3424
1.5313
2.3447
20
Regression
Residual
Total
Intercept
Age
252.
SS
28.65711
15.89289
44.55
Upper 95%
-0.5156
0.1720
{Willie Nelson Concert Narrative} Predict with 95% confidence the number of concerts
attended by a 45 years-old individual.
ANSWER:
2.645 2.057. Thus, LCL = 0.588, and UCL = 4.702
253.
{Willie Nelson Concert Narrative} Estimate with 95% confidence the average number of
concerts attended by all 45 year-old individuals.
ANSWER:
2.645 0.577. Thus, LCL = 2.068, and UCL = 3.222
254.
{Willie Nelson Concert Narrative} Which interval in the previous two questions is
narrower: the confidence interval estimate of the expected value of y or the prediction
interval for the same given value of x (10 years) and same confidence level? Why?
ANSWER:
148
Chapter Seventeen
The confidence interval estimate of the expected value of y is narrower than the
prediction interval for the same given value of x (10 years) and some confidence level.
This is because there is less error in estimating a mean value as opposed to predicting an
individual value.
Coef
StDev
T
9.4349
0.2867
32.91
0.095235 0.008220 11.59
S = 0.1314
R-Sq = 92.46%
Degrees
Price
Degrees
21.281667
2.026750
P
0.000
0.000
R-Sq(adj) = 91.7%
Analysis of Variance
Source
Regression
Residual Error
Total
DF
1
11
12
SS
2.3162
0.1898
2.5060
4.613
0.457
Covariances
Regression Analysis
Predictor
Constant
Degrees
StDev
MS
2.3162
0.0173
F
134.24
P
0.000
Price
0.208833
SE
1.280
0.127
{Oil Quality and Price Narrative} Predict with 95% confidence the oil price per barrel for
an API degree of 35.
ANSWER:
12.768 (2.201)(0.1314)(1.038) = 12.768 0.300 . Thus, LCL = 12.468, and UCL =
13.068
256.
{Oil Quality and Price Narrative} Estimate with 95% confidence the average oil price per
barrel for an API degree of 35.
ANSWER:
12.768 (2.201)(0.1314)(0.2785) = 12.768 0.081. Thus, LCL = 12.687, and UCL =
12.849
257.
{Oil Quality and Price Narrative} Which interval in the previous two questions is
narrower: the confidence interval estimate of the expected value of y or the prediction
interval for the same given value of x (10 years) and same confidence level? Why?
ANSWER:
The confidence interval estimate of the expected value of y is narrower than the
prediction interval for the same given value of x (10 years) and some confidence level.
This is because there is less error in estimating a mean value as opposed to predicting an
individual value.
150
Chapter Seventeen
SECTION 7
MULTIPLE CHOICE QUESTIONS
In the following multiple-choice questions, please circle the correct answer.
258.
259.
260.
261.
If the plot of the residuals is fan shaped, which assumption of regression analysis if
violated?
a. Normality
b. Homoscedasticity
c. Independence of errors
d. No assumptions are violated, the graph should resemble a fan
ANSWER: b
In regression analysis we use the Spearman rank correlation coefficient to measure and
test to determine whether a relationship exists between the two variables if
a. one or both variables may be ordinal
b. both variables are interval but the normality requirement is not met
c. both (a) and (b)
d. neither (a) nor (b)
ANSWER: c
263.
The sample Spearman rank correlation coefficient, where a and b are the ranks of x and y,
respectively, is given by
a. rs cov a, b / sa / sb
b. rs cov a, b / sa sb
c. rs cov a, b / sa sb
d. rs cov a, b / sa sb
ANSWER: d
152
Chapter Seventeen
TRUE / FALSE QUESTIONS
264.
2
The variance of the error variable is required to be constant. When this requirement is
satisfied, the condition is called homoscedasticity.
ANSWER: T
265.
2
The variance of the error variable is required to be constant. When this requirement is
violated, the condition is called heteroscedasticity.
ANSWER: T
266.
We standardize residuals in the same way we standardize all variables, by subtracting the
mean and dividing by the variance.
ANSWER: F
267.
268.
One method of diagnosing heteroscedasticity is to plot the residuals against the predicted
values of y, then look for a change in the spread of the plotted values.
ANSWER: T
269.
Regardless of the value of x, the standard deviation of the distribution of y values about
the regression line is the same. This assumption of equal standard deviations about the
regression line is called residual analysis.
ANSWER: F
270.
271.
When n is greater than 30, the sample Spearman rank correlation coefficient rs is
approximately normally distributed with mean of 0 and standard deviation of 1.
ANSWER: F
272.
Given that n = 37, and the value of sample Spearman rank correlation coefficient rs =
0.35, the value of the test statistic for testing H o : s 0 is z = 2.10
ANSWER: T
273.
Another name for Pearson coefficient of correlation is the Spearman rank correlation
coefficient.
ANSWER: F
Years of Experience
0
2
10
3
8
5
12
7
20
15
Sales
7
9
20
15
18
14
20
17
30
25
{Sales and Experience Narrative} Use the regression equation y 8.63 1.0817 x to
determine the predicted values of y.
ANSWER:
y : 8.630, 10.793, 19.447, 11.875, 17.284, 14.039, 21.610, 16.202, 30.264, and 24.856
275.
{Sales and Experience Narrative} Use the predicted and actual values of y to calculate the
residuals.
ANSWER:
ri : 1.630, -1.793, 0.553, 3.125, 0.716, -0.039, -1.610, 0.798. 0.264, and 0.144
154
276.
Chapter Seventeen
{Sales and Experience Narrative} Plot the residuals against the predicted values of y.
Does the variance appear to be constant?
ANSWER:
Residuals versus Predicted
4
3
Residuals
2
1
0
-1 0
10
15
20
25
30
35
-2
-3
Predicted Values
278.
16
58
11
40
15
55
8
35
12
43
10
41
13
52
14
49
{Income and Education Narrative} Use the regression equation y 10.6165 2.9098 x to
determine the predicted values of y.
ANSWER:
y : 57.173, 42.624, 54.263, 33.895, 45.534, 39.714, 48.444, and 51.353
{Income and Education Narrative} Use the predicted and actual values of y to calculate
the residuals.
ANSWER:
ri : 0.877, -2.624, 0.737, 1.105, -2.534, 1.286, 3.556, and 2.353.
281.
{Income and Education Narrative} Plot the residuals against the predicted values of y.
Does the variance appear to be constant?
ANSWER:
Residuals versus Predicted
Residulas
4
2
0
-2 0
10
20
30
40
50
60
70
-4
Predicted Values
283.
156
Chapter Seventeen
Contestant
1
2
3
4
5
6
7
8
284.
Years of Education
11
15
12
16
11
16
13
14
{Game
Winnings
and
Winnings
750
400
600
350
800
300
650
400
Education
Narrative}
Use
the
regression
equation
ANSWER:
y : 754.167, 397.500, 665.000, 308.333, 754.167, 308.333, 575.833, and 486.667
285.
{Game Winnings and Education Narrative} Use the predicted and actual values of y to
calculate the residuals.
ANSWER:
ri : 4.167, 2.500, -65.000, 41.667, 45.833, -8.333, 74.167, and 86.667
{Game Winnings and Education Narrative} Plot the residuals against the predicted values
y . Does the variance appear to be constant.
ANSWER:
Residuals versus Predicted
Residuals
286.
100
75
50
25
0
-25 0
-50
-75
-100
100
200
300
400
500
Predicted Values
600
700
800
288.
Gross Revenue
48
65
18
20
31
26
73
23
39
58
{Movie Revenues Narrative} Use the regression equation y 4.225 8.285 x to determine
the predicted values of y.
ANSWER:
y : 48.137, 63.878, 14.996, 19.139, 33.223, 25.767, 70.506, 24.110, 41.508, and 59.736.
290.
{Movie Revenues Narrative} Use the predicted and actual values of y to calculate the
residuals.
ANSWER:
ri : -0.137, 1.122, 3.004, 0.861, -2.223, 0.233, 2.494, -1.110, 2.508, and 1.736
158
291.
Chapter Seventeen
{Movie Revenues Narrative} Plot the residuals against the predicted values of y. Does the
variance appear to be constant.
ANSWER:
Residuals versus Predicted
Residuals
4
2
0
-2
10
20
30
40
50
60
70
80
-4
Predicted Values
293.
62
6
57
5
40
4
49
3
67
5
54
5
43
2
65
6
54
3
41
1
Age
Number of Concerts
44
3
48
2
55
4
60
5
59
4
63
5
69
4
40
2
38
1
52
3
DESCRIPTIVE STATISTICS
Regression Statistics
Multiple R
0.80203
R Square
0.64326
Adjusted R Square
0.62344
Standard Error
0.93965
Observations
20
Age
Mean
Standard Error
Standard Deviation
Sample Variance
Count
53
2.1849
9.7711
95.4737
20
Concerts
Mean
Standard Error
Standard Deviation
Sample Variance
Count
MS
28.65711
0.88294
F
32.45653
Significance F
2.1082E-05
t Stat
-2.53491
5.69706
P-value
0.02074
0.00002
Lower 95%
-5.50746
0.07934
3.65
0.3424
1.5313
2.3447
20
Intercept
Age
294.
df
1
18
19
SS
28.65711
15.89289
44.55
{Willie Nelson Concert Narrative} Use the regression equation y 3.0115 0.1257 x to
determine the predicted values of y.
ANSWER:
The predicted values y are:
4.781 4.153 2.016 3.147
2.519 3.022 3.901 4.530
295.
5.410
4.404
3.776
4.907
2.393
5.661
5.158
2.016
3.776
1.765
2.142
3.524
{Willie Nelson Concert Narrative} Use the predicted values and the actual values of y to
calculate the residuals.
ANSWER:
The residuals r y y are:
1.219 0.847 1.984 -0.147 -0.410
0.481 -1.022
0.099
0.470 -0.404
296.
Upper 95%
-0.5156
0.1720
1.224
0.093
-0.393 0.842
-1.661 -0.016
-0.776
-0.765
-1.142
-0.524
{Willie Nelson Concert Narrative} Plot the residuals in against the predicted values y .
160
Chapter Seventeen
ANSWER:
Residuals versus Predicted
3.000
Residuals
2.000
1.000
0.000
-1.000
-2.000
Predicted
297.
298.
Histogram
Frequency
10
8
6
4
2
0
-1
Residuals
299.
{Willie Nelson Concert Narrative} Does it appear that the errors are normally
distributed? Explain.
{Willie Nelson Concert Narrative} Use the residuals to compute the standardized
residuals.
ANSWER:
The standardized residuals r / s are:
1.297 0.902 2.111 -0.157 -0.436
0.512 -1.087 0.105 0.500 -0.430
301.
1.303
0.099
-0.418
-1.768
0.896
-0.017
-0.826
-0.814
-1.215
-0.558
21.281667
2.026750
Descriptive Statistics
Variable
N
Mean
Mean
Degrees
13
34.60
Price
13 12.730
Covariances
Degrees
0.208833
Regression Analysis
Predictor
Coef
StDev
T
Constant
9.4349
0.2867
32.91
Degrees
0.095235 0.008220 11.59
P
0.000
0.000
Price
StDev
SE
4.613
0.457
1.280
0.127
162
Chapter Seventeen
S = 0.1314
R-Sq = 92.46%
R-Sq(adj) = 91.7%
Analysis of Variance
Source
Regression
Residual Error
Total
302.
DF
1
11
12
SS
2.3162
0.1898
2.5060
MS
2.3162
0.0173
F
134.24
P
0.000
{Oil Quality and Price Narrative} Use the regression equation y 9.4349 0.095235 x to
determine the predicted values of y.
ANSWER:
The predicted values y are: 12.006, 12.149, 12.368, 12.416, 12.473, 12.721, 12.673,
12.740, 12.959, 13.340, 13.340, 13.130, and 13.178.
303.
{Oil Quality and Price Narrative} Use the predicted values and the actual values of y to
calculate the residuals.
ANSWER:
The residuals r y y are: 0.014, -0.109, -0.048, -0.146, 0.017, -0.021, 0.127, 0.260,
0.041, -0.170, -0.150, 0.090, and 0.092.
{Oil Quality and Price Narrative} Plot the residuals against the predicted values y .
ANSWER:
0.3
0.2
Residual
304.
0.1
0.0
-0.1
-0.2
12.0
12.2
12.4
12.6
12.8
Fitted Value
13.0
13.2
13.4
305.
{Oil Quality and Price Narrative} Does it appear that heteroscedasticity is a problem?
Explain.
ANSWER:
The variance of the error variable appears to be constant; therefore heteroscedasticity is
not a problem.
306.
Frequency
307.
{Oil
0 Quality and Price Narrative} Does it appear that the errors are normally distributed?
Explain. -0.2
-0.1
0.0
0.1
0.2
0.3
Residual
ANSWER:
The histogram is fairly symmetric; therefore we may conclude that the errors are
normally distributed.
308.
{Oil Quality and Price Narrative} Use the residuals to compute the standardized
residuals.
ANSWER:
The standardized residuals r / s are: 0.105, -0.830, -0.366, -1.109,
0.967, 1.982, 0.315, -1.290, -1.138, 0.685, and 0.703.
0.130, -0.156,
164
Chapter Seventeen
309.