Statistics-17 by Keller

Simple Linear Regression and Correlation
89
CHAPTER 17
SIMPLE LINEAR REGRESSION
AND CORRELATION
SECTIONS 1 - 2
MULTIPLE CHOICE QUESTIONS
In the following multiple-choice questions, please circle the correct answer.
1.
The regression line y = 3 + 2x has been fitted to the data points (4, 8), (2, 5), and (1, 2).
The sum of the squared residuals will be:
a. 7
b. 15
c. 8
d. 22
ANSWER: d
2.
If an estimated regression line has a y-intercept of 10 and a slope of 4, then when x = 2

the actual value of y is:
a. 18
b. 15
c. 14
d. unknown
ANSWER: d
3.
Given the least squares regression line y = 5 2x:
90
Chapter Seventeen
a. the relationship between x and y is positive
b. the relationship between x and y is negative
c. as x increases, so does y
d. as x decreases, so does y
ANSWER: b
4.
A regression analysis between weight (y in pounds) and height (x in inches) resulted in

the following least squares line: y = 120 + 5x. This implies that if the height is increased
by 1 inch, the weight, on average, is expected to:
a. increase by 1 pound
b. decrease by 1 pound
c. increase by 5 pounds
d. increase by 24 pounds
ANSWER: c
5.
A regression analysis between sales (in $1000) and advertising (in $100) resulted in the
following least squares line: y = 75 +6x. This implies that if advertising is $800, then
the predicted amount of sales (in dollars) is:
a. $4875
b. $123,000
c. $487,500
d. $12,300
ANSWER: b
6.
A regression analysis between sales (in $1000) and advertising (in $) resulted in the
following least squares line: y = 80,000 + 5x. This implies that an:
a. increase of $1 in advertising is expected, on average, to result in an increase of $5 in
sales
b. increase of $5 in advertising is expected, on average, to result in an increase of $5,000
in sales
c. increase of $1 in advertising is expected, on average, to result in an increase of
$80,005 in sales
d. increase of $1 in advertising is expected, on average, to result in an increase of $5,000
in sales
ANSWER: d
7.
Which of the following techniques is used to predict the value of one variable on the
basis of other variables?
a. Correlation analysis
b. Coefficient of correlation
c. Covariance
d. Regression analysis
ANSWER: d
8.
The residual is defined as the difference between:

91
a. the actual value of y and the estimated value of y
b. the actual value of x and the estimated value of x
c. the actual value of y and the estimated value of x
d. the actual value of x and the estimated value of y
ANSWER: a
9.
In the simple linear regression model, the y-intercept represents the:

a. change in y per unit change in x
b. change in x per unit change in y
c. value of y when x = 0
d. value of x when y = 0
ANSWER: c
10.
In the first order linear regression model, the population parameters of the y-intercept and
the slope are estimated respectively, by:
a. b0 and b1
b. b0 and 1
c. 0 and b1
d. 0 and 1
ANSWER: a
11.
In the simple linear regression model, the slope represents the:

a. value of y when x = 0
b. average change in y per unit change in x
c. value of x when y = 0
d. average change in x per unit change in y
ANSWER: b
12.
In regression analysis, the residuals represent the:

a. difference between the actual y values and their predicted values
b. difference between the actual x values and their predicted values
c. square root of the slope of the regression line
d. change in y per unit change in x
ANSWER: a
13.
In the first-order linear regression model, the population parameters of the y-intercept and
the slope are, respectively,
a. b0 and b1
b. b0 and 1
c. 0 and b1
d. 0 and 1
ANSWER: d
92
Chapter Seventeen
14.
In a simple linear regression problem, the following statistics are calculated from a
sample of 10 observations: ( x x )( y y ) = 2250, s x = 10, x = 50, y = 75.
The least squares estimates of the slope and y-intercept are respectively:
a. 1.5 and 0.5
b. 2.5 and 1.5
c. 1.5 and 2.5
d. 2.5 and 5.0
ANSWER: d
15.
If a simple linear regression model has no y-intercept, then:

a. all values of x are zero
b. all values of y are zero
c. when y = 0 so does x
d. when x = 0 so does y
ANSWER: d
16.
In the least squares regression line y = 3 - 2x, the predicted value of y equals:
a. 1.0 when x = -1.0
b. 2.0 when x = 1.0
c. 2.0 when x = -1.0
d. 1.0 when x = 1.0
ANSWER: d
17.
The least squares method for determining the best fit minimizes:
a. total variation in the dependent variable
b. sum of squares for error
c. sum of squares for regression
d. All of the above
ANSWER: b
18.
What do we mean when we say that a simple linear regression model is statistically
useful?
a. All the statistics computed from the sample make sense
b. The model is an excellent predictor of y
c. The model is practically useful for predicting y
d. The model is a better predictor of y than the sample y
ANSWER: d

93
TRUE / FALSE QUESTIONS
19.
An inverse relationship between an independent variable x and a dependent variably y

means that as x increases, y decreases, and vice versa.
ANSWER: T
20.
A direct relationship between an independent variable x and a dependent variably y

means that the variables x and y increase or decrease together.
ANSWER: T
21.
Another name for the residual term in a regression equation is random error.
ANSWER: T
22.
A simple linear regression equation is given by y 5.25 3.8 x . The point estimate of y
when x = 4 is 20.45.
ANSWER: T
23.
The vertical spread of the data points about the regression line is measured by the yintercept.
ANSWER: F
24.
The method of least squares requires that the sum of the squared deviations between
actual y values in the scatter diagram and y values predicted by the regression line be
minimized.
ANSWER: T
25.
A regression analysis between sales (in $1000) and advertising (in $) resulted in the
following least squares line: y = 60 + 5x. This implies that an increase of $1 in
advertising is expected to result in an increase of $65 in sales.
ANSWER: F
26.
A regression analysis between weight ( y in pounds) and height ( x in inches) resulted in

the following least squares line: y = 135 + 6 x . This implies that if the height is
increased by 1 inch, the weight is expected to increase by an average of 6 pounds.
ANSWER: T
27.
The residual ri is defined as the difference between the actual value yi and the estimated
value yi .
ANSWER: T
28.
The regression line y = 2 + 3x has been fitted to the data points (4,11), (2,7), and (1,5).
The sum of squares for error will be 10.0.
ANSWER: T
94
Chapter Seventeen
29.
A regression analysis between sales (in $1000) and advertising (in $100) resulted in the
following least squares line: y = 77 +8x. This implies that if advertising is $600, then
the predicted amount of sales (in dollars) is $125,000.
ANSWER: T
30.
The residuals are observations of the error variable . Consequently, the minimized sum
of squared deviations is called the sum of squares for error, denoted SSE.
ANSWER: T
31.
Statisticians have shown that sample y -intercept b0 and sample slope coefficient b1 are
unbiased estimators of the population regression parameters 0 and 1 , respectively.
ANSWER: T
32.
If cov(x, y) = 7.5075 and sx2 = 3.5, then the sample slope coefficient is 2.145.
ANSWER: T
33.
The first order linear model is sometimes called the simple linear regression model.
ANSWER: T
34.
To create a deterministic model, we start with a probabilistic model that approximates the
relationship we want to model.
ANSWER: F
35.
The residual represents the discrepancy between the observed dependent variable and its
Predicted or estimated average value.
ANSWER: T

95
STATISTICAL CONCEPTS & APPLIED QUESTIONS
FOR QUESTIONS 36 AND 37, USE THE FOLLOWING NARRATIVE:
Narrative: Car Speed and Gas Mileage
An economist wanted to analyze the relationship between the speed of a car (x) and its gas
mileage (y). As an experiment a car is operated at several different speeds and for each speed the
gas mileage is measured. These data are shown below.
Speed
Gas Mileage
36.
25
40
35
39
45
37
50
33
60
30
65
27
70
25
{Car Speed and Gas Mileage Narrative} Determine the least squares regression line.
ANSWER:
50.6563 0.3531x
y
37.
{Car Speed and Gas Mileage Narrative} Estimate the gas mileage of a car traveling 70
mph.
ANSWER:
When x = 70, y = 25.9393 mpg
38.
The following 10 observations of variables x and y were collected.

x
y
1
25
2
22
3
21
4
19
5
14
6
15
7
12
8
10
9
6
10
2
Find the least squares regression line, and the estimated value of y when x = 3
ANSWER:
27.733-2.389x. When x = 3, y = 20.566
y
39.
A scatter diagram includes the following data points:

x
y
3
8
2
6
5
12
4
10
5
14
Two regression models are proposed: (1) y 1.2 + 2.5x, and (2) y 5.5 + 4.0x.
Using the least squares method, which of these regression models provide the better fit
to the data? Why?
ANSWER:
SSE = 4.95 and 593.25 for models 1 and 2, respectively. Therefore, model (1) fits the data
better than model (2).
40.
Consider the following data values of variables x and y.
96
Chapter Seventeen
x
y
2
7
4
11
6
17
8
21
10
27
13
36
a. Determine the least squares regression line.

b. Find the predicted value of y for x = 9.
c. What does the value of the slope of the regression line tell you?
ANSWER:
a. y 0.934 + 2.637x
b. When x = 9, y = 24.667
c. If x increases by one unit, y on average will increase by 2.637.
FOR QUESTIONS 41 THROUGH 45, USE THE FOLLOWING NARRATIVE:
Narrative: Sunshine and Skin Cancer
A medical statistician wanted to examine the relationship between the amount of sunshine (x) in
hours, and incidence of skin cancer (y). As an experiment he found the number of skin cancers
detected per 100,000 of population and the average daily sunshine in eight counties around the
country. These data are shown below.
Average Daily Sunshine
Skin Cancer per 100,000
41.
5
7
7
11
6
9
7
12
8
15
6
10
4
7
3
5
{Sunshine and Skin Cancer Narrative} Determine the least squares regression line.
ANSWER:
-1.115 + 1.846x
y
{Sunshine and Skin Cancer Narrative} Draw a scatter diagram of the data and plot the
least squares regression line on it.
ANSWER:
Average Daily Sunshine Line Fit Plot
16
Skin Cancer
12
Skin Cancer
42.
Predicted Skin Cancer
Linear (Predicted Skin

Cancer)
0
0
10

97
43.
{Sunshine and Skin Cancer Narrative} Estimate the number of skin cancer per 100,000
of population for 6 hours of sunshine.
ANSWER:
When x = 6, y = 9.961
44.
{Sunshine and Skin Cancer Narrative} What does the value of the slope of the regression
line tell you?
ANSWER:
If the amount of sunshine x increases by one hour, the amount of skin cancer y increases
by an average of 1.846 per 100,000 of population.
45.
{Sunshine and Skin Cancer Narrative} Calculate the residual corresponding to the pair (x,
y) = (8, 15).
ANSWER:
e = y - y = 15 13.653 = 1.347

NARRATIVE: Sales and Experience
The general manager of a chain of furniture stores believes that experience is the most important
factor in determining the level of success of a salesperson. To examine this belief she records last
months sales (in $1,000s) and the years of experience of 10 randomly selected salespeople.
These data are listed below.
Salesperson
1
2
3
4
5
6
7
8
9
10
Years of Experience
0
2
10
3
8
5
12
7
20
15
Sales
7
9
20
15
18
14
20
17
30
25
98
Chapter Seventeen
46.
{Sales and Experience Narrative} Draw a scatter diagram of the data to determine
whether a linear model appears to be appropriate.
ANSWER:
Scatter Diagram
35
Sales
30
25
20
15
10
5
0
0
10
15
20
25
Years of Experience
It appears that a linear model is appropriate.

47.
{Sales and Experience Narrative} Determine the least squares regression line.
ANSWER:
8.63 + 1.0817x
y
48.
{Sales and Experience Narrative} Interpret the value of the slope of the regression line.
ANSWER:
For each additional year of experience, monthly sales of a salesperson increase by an
average of $1,081.7.
49.
{Sales and Experience Narrative} Estimate the monthly sales for a salesperson with 16
years of experience.
ANSWER:
When x =16, y = 25.94

Narrative: Income and Education
A professor of economics wants to study the relationship between income (y in $1000s) and
education (x in years). A random sample eight individuals is taken and the results are shown
below.
Education
Income
16
58
11
40
15
55
8
35
12
43
10
41
13
52
14
49

99
50.
{Income and Education Narrative} Draw a scatter diagram of the data to determine
whether a linear model appears to be appropriate.
ANSWER:
Scatter Diagram
Income
60
50
40
30
6
10
12
14
16
18
Years of Education

51.
{Income and Education Narrative} Determine the least squares regression line.
ANSWER:
10.6165 + 2.9098x
y
52.
{Income and Education Narrative} Interpret the value of the slope of the regression line.
ANSWER:
For each additional year of education, the income increases by an average of $2,909.80.
53.
{Income and Education Narrative} Estimate the income of an individual with 15 years of
education.
ANSWER:
When x = 15, y = 54.264 (in $1000s) or $54,264.0

Narrative: Game Winnings and Education
An ardent fan of television game shows has observed that, in general, the more educated the
contestant, the less money he or she wins. To test her belief she gathers data about the last eight
winners of her favorite game show. She records their winnings in dollars and the number of years
of education. The results are as follows.
100
Chapter Seventeen
Contestant
1
2
3
4
5
6
7
8
54.
Years of Education
11
15
12
16
11
16
13
14
Winnings
750
400
600
350
800
300
650
400
{Game Winnings and Education Narrative} Draw a scatter diagram of the data to
determine whether a linear model appears to be appropriate.
ANSWER:
Scatter Diagram
1000
Winnings
800
600
400
200
8
10
12
14
16
18
Years of Education

55.
{Game Winnings and Education Narrative} Determine the least squares regression line.
ANSWER:
1735 89.1667x
y
56.
{Game Winnings and Education Narrative} Interpret the value of the slope of the
regression line.
ANSWER:
For each additional year of education a contestant has, his or her winnings on TV game
shows decreases by an average of approximately $89.20.

101
57.
{Game Winnings and Education Narrative} Estimate the game winnings for a contestant
with 15 years of education.
ANSWER:
When x = 15, y = $397.50

Narrative: Movie Revenues
A financier whose specialty is investing in movie productions has observed that, in general,
movies with big-name stars seem to generate more revenue than those movies whose stars are
less well known. To examine his belief he records the gross revenue and the payment (in $
millions) given to the two highest-paid performers in the movie for ten recently released movies.
Movie
58.
Gross Revenue
48
65
18
20
31
26
73
23
39
58
{Movie Revenues Narrative} Draw a scatter diagram of the data to determine whether a
linear model appears to be appropriate.
ANSWER: It appears that a linear model is appropriate.
Scatter Diagram
Gross Revenue
1
2
3
4
5
6
7
8
9
10
Cost of Two Highest

Paid Performers
5.3
7.2
1.3
1.8
3.5
2.6
8.0
2.4
4.5
6.7
80
70
60
50
40
30
20
10
0
0
Payment to Top Tw o Stars
10
102
59.
Chapter Seventeen
{Movie Revenues Narrative} Determine the least squares regression line.
ANSWER:
4.225 + 8.285x
y
60.
{Movie Revenues Narrative} Interpret the value of the slope of the regression line.
ANSWER:
For each million dollar paid to the two highest paid performers, the gross revenue of the
movie increases by an average of $8.285 million.
61.
{Movie Revenues Narrative} Estimate the gross revenue of a movie if the two highest
paid performers received 6 million dollars.
ANSWER:
When x = 6, y = $53.935 million

NARRATIVE: Cost of Books
The editor of a major academic book publisher claims that a large part of the cost of books is the
cost of paper. This implies that larger books will cost more money. As an experiment to analyze
the claim, a university student visits the bookstore and records the number of pages and the
selling price of twelve randomly selected books. These data are listed below.
Book
1
2
3
4
5
6
7
8
9
10
11
12
62.
Number of Pages
844
727
360
915
295
706
410
905
1058
865
677
912
Selling Price ($)

55
50
35
60
30
50
40
53
65
54
42
58
{Cost of Books Narrative} Determine the least squares regression line.

ANSWER:
19.387 + .0414x
y

103
63.
{Cost of Books Narrative} Draw a scatter diagram of the data and plot the least squares
regression line on it.
ANSWER:
Selling Price
Number of Pages Line Fit Plot

70
60
50
40
Selling Price
30
20
10
0
Linear (Predicted Selling

Price)
Predicted Selling Price
200
400
600
800
1000
1200
Number of Pages
64.
{Cost of Books Narrative} Interpret the value of the slope of the regression line.
ANSWER:
For every additional page, the price of a book increases by an average of about 4 cents.
65.
{Cost of Books Narrative} Estimate the selling price for a 650 pages book.
ANSWER:
When x = 650, y = $46.037

Narrative: Accidents and Precipitation
A statistician investigating the relationship between the amount of precipitation (in inches) and
the number of automobile accidents gathered data for 10 randomly selected days. The results
Day
1
2
3
4
5
6
7
8
9
10
Precipitation
0.05
0.12
0.05
0.08
0.10
0.35
0.15
0.30
0.10
0.20
Number of Accidents
5
6
2
4
8
14
7
13
7
10
104
66.
Chapter Seventeen
{Accidents and Precipitation Narrative} Find the least squares regression line.
ANSWER:
2.3704 + 34.864x
y
67.
{Accidents and Precipitation Narrative} Estimate the number of accidents in a day with
0.25 inches of precipitation
ANSWER:
When x = 0.25, y = 11.08 11 accidents
68.
{Accidents and Precipitation Narrative} What does the slope of the least squares
regression line tell you?
ANSWER:
For each additional inch of precipitation, the number of accidents on average increases by
34.864 (about 35 accidents).
FOR QUESTIONS 69 THROUGH 73, USETHE FOLLOWING NARRATIVE:

Narrative: Willie Nelson Concert
At a recent Willie Nelson concert, a survey was conducted that asked a random sample of 20
people their age and how many concerts they have attended since the first of the year. The
following data were collected:
Age
Number of Concerts
62
6
57
5
40
4
49
3
67
5
54
5
43
2
65
6
54
3
41
1
Age
Number of Concerts
44
3
48
2
55
4
60
5
59
4
63
5
69
4
40
2
38
1
52
3
An Excel output follows :

SUMMARY OUTPUT
DESCRIPTIVE STATISTICS
Regression Statistics
Multiple R
0.80203
R Square
0.64326
Adjusted R Square
0.62344
Standard Error
0.93965
Observations
20
Age
Mean
Standard Error
Standard Deviation
Sample Variance
Count
53
2.1849
9.7711
95.4737
20
Concerts
Mean
Standard Error
Standard Deviation
Sample Variance
Count
MS
28.65711
0.88294
F
32.45653
Significance F
2.1082E-05
t Stat
-2.53491
5.69706
P-value
0.02074
0.00002
Lower 95%
-5.50746
0.07934
3.65
0.3424
1.5313
2.3447
20
SPEARMAN RANK CORRELATION COEFFICIENT=0.8306

ANOVA
Regression
Residual
Total
Intercept
Age
df
1
18
19
SS
28.65711
15.89289
44.55
Coefficients Standard Error

-3.01152
1.18802
0.12569
0.02206
Upper 95%
-0.5156
0.1720

105
69.
{Willie Nelson Concert Narrative} Draw a scatter diagram of the data to determine
whether a linear model appears to be appropriate to describe the relationship between the
age and number of concerts attended by the respondents.
ANSWER:
Scatter Diagram
Number of Concerts
7
6
5
4
3
2
1
0
30
35
40
45
50
55
60
65
70
75
Age
A linear model appears to be appropriate to describe the relationship between the age and
number of concerts attended by the respondents.
70.
{Willie Nelson Concert Narrative} Determine the least squares regression line.
ANSWER:
-3.0115 + 0.1257x
y
{Willie Nelson Concert Narrative} Plot the least squares regression line on the scatter
diagram.
ANSWER:
Scatter Diagram with Trendline
7
Number of Concerts
71.
6
5
4
3
2
1
0
30
35
40
45
50
55
Age
60
65
70
75
106
72.
Chapter Seventeen
{Willie Nelson Concert Narrative} Interpret the value of the slope of the regression line.
ANSWER:
For every additional year of age, the number of concerts attended increases on average by
0.1257. Equivalently we may say, for every additional 20 years of age, the number of
concerts attended increases on average by about 2.50.
73.
{Willie Nelson Concert Narrative} Estimate the number of Willie Nelson concerts
attended by a 64 year old person.
ANSWER:
When x = 64, y = 5.03 (about 5 concerts)

Narrative: Oil Quality and Price
Quality of oil is measured in API gravity degrees the higher the degrees API, the higher the
quality. The table shown below is produced by an expert in the field who believes that there is a
relationship between quality and price per barrel.
Oil degrees API
27.0
28.5
30.8
31.3
31.9
34.5
34.0
34.7
37.0
41.0
41.0
38.8
39.3
Price per barrel (in $)

12.02
12.04
12.32
12.27
12.49
12.70
12.80
13.00
13.00
13.17
13.19
13.22
13.27
A partial Minitab output follows:

Descriptive Statistics
Variable
N
Mean
Mean
Degrees
13
34.60
Price
13 12.730
S = 0.1314
Coef
StDev
T
9.4349
0.2867
32.91
0.095235 0.008220 11.59
R-Sq = 92.46%
Degrees
Price
Degrees
21.281667
2.026750
P
0.000
0.000
R-Sq(adj) = 91.7%
Analysis of Variance
Source
Regression
DF
1
SS
2.3162
4.613
0.457
Covariances
Regression Analysis
Predictor
Constant
Degrees
StDev
MS
2.3162
F
134.24
P
0.000
Price
0.208833
SE
1.280
0.127

107
Residual Error
Total
74.
11
12
0.1898
2.5060
0.0173
{Oil Quality and Price Narrative} Draw a scatter diagram of the data to determine
whether a linear model appears to be appropriate to describe the relationship between the
quality of oil and price per barrel.
ANSWER:
Price
Scatter Diagram
13.4
13.2
13
12.8
12.6
12.4
12.2
12
11.8
20
25
30
35
40
45
Degrees
A linear model appears to be appropriate to describe the relationship between the quality
of oil and price per barrel.
75.
{Oil Quality and Price Narrative} Determine the least squares regression line.
ANSWER:
9.4349 + 0.095235x
y
{Oil Quality and Price Narrative} Plot the least squares regression line on the scatter
diagram.
ANSWER:
Scatter Diagram
Price
76.
13.6
13.4
13.2
13
12.8
12.6
12.4
12.2
12
11.8
20
25
30
Degrees
35
40
45
108
77.
Chapter Seventeen
{Oil Quality and Price Narrative} Interpret the value of the slope of the regression line.
ANSWER:
For every additional API gravity degree, the price of oil per barrel increases by an
average of 9.52 cents.

109
SECTIONS 3 - 4
78.
In a simple linear regression problem, the following sum of squares are produced:
( yi y ) 2 200 , ( yi y i ) 2 50 , and ( y i y ) 2 150 . The percentage of the
variation in y that is explained by the variation in x is:
a. 25%
b. 75%
c. 33%
d. 50%
ANSWER: b
79.
In simple linear regression, most often we perform a two-tail test of the population slope
1 to determine whether there is sufficient evidence to infer that a linear relationship
exists. The null hypothesis is stated as:
a. H 0 : 1 0
b. H 0 : 1 b1
c. H 0 : 1 r
d. H 0 : 1 s
ANSWER: a
80.
Testing whether the slope of the population regression line could be zero is equivalent to
testing whether the:
a. sample coefficient of correlation could be zero
b. standard error of estimate could be zero
c. population coefficient of correlation could be zero
d. sum of squares for error could be zero
ANSWER: c
81.
2
Given that s x2 500, s y 750 , cov (x, y) = 100, and n = 6, the standard error of
estimate is:
a. 12.247
b. 24.933
c. 30.2076
d. 11.180
ANSWER: c
82.
The symbol for the population coefficient of correlation is:
110
Chapter Seventeen
a. r
b.
c. r 2
d. 2
ANSWER: b
83.
Given that the sum of squares for error is 60 and the sum of squares for regression is 140,
then the coefficient of determination is:
a. 0.429
b. 0.300
c. 0.700
d. 0.837
ANSWER: c
84.
A regression line using 25 observations produced SSR = 118.68 and SSE = 56.32. The
standard error of estimate was:
a. 2.1788
b. 1.5648
c. 1.5009
d. 2.2716
ANSWER: b
85.
The symbol for the sample coefficient of correlation is:

a. r
a.
b. r 2
c. 2
ANSWER: a
86.
Given the least squares regression line y = -2.48 + 1.63x, and a coefficient of
determination of 0.81, the coefficient of correlation is:
a. -0.85
b. 0.85
c. -0.90
d. 0.90
ANSWER: d
87.
Which value of the coefficient of correlation r indicates a stronger correlation than 0.65?
a. 0.55
b. -0.75
c. 0.60
d. -0.45
ANSWER: b
88.
If the coefficient of determination is 0.975, then the slope of the regression line:
a. must be positive

111
b. must be negative
c. could be either positive or negative
d. None of the above.
ANSWER: c
89.
In regression analysis, if the coefficient of determination is 1.0, then:

a. the sum of squares for error must be 1.0
b. the sum of squares for regression must be 1.0
c. the sum of squares for error must be 0.0
d. the sum of squares for regression must be 0.0
ANSWER: c
90
The sample correlation coefficient between x and y is 0.375. It has been found out that
the p value is 0.744 when testing H o : 0 against the one-sided alternative H1 : 0 .
To test the H o : 0 against the two-sided alternative H1 : 0 at a significance level
of 0.193, the p value is
a. 0.372
b. 1.488
c. 0.256
d. 0.512
ANSWER: d
91.
Correlation analysis is used to determine:

a. the strength of the relationship between x and y
b. the least squares estimates of the regression parameters
c. the predicted value of y for a given value of x
d. the coefficient of determination
ANSWER: a
92.
If the coefficient of correlation is 0.80 then, the percentage of the variation in y that is
explained by the variation in x is:
a. 80%
b. 64%
c. 80%
d. 64%
ANSWER: b
93.
If all the points in a scatter diagram lie on the least squares regression line, then the
coefficient of correlation must be:
a. 1.0
b. 1.0
c. either 1.0 or 1.0
d. 0.0
ANSWER: c
If the coefficient of correlation is 0.60, then the coefficient of determination is:
a. -0.60
94.
112
Chapter Seventeen
b. -0.36
c. 0.36
d. 0.40
ANSWER: c
95.
In regression analysis, if the coefficient of correlation is 1.0, then:

a. the sum of squares for error is 1.0
b. the sum of squares for regression is 1.0
c. the sum of squares for error and sum of squares for regression are equal
d. the sum of squares for regression and total variation in y are equal
ANSWER: d
96.
If the coefficient of correlation between x and y is close to 1.0, this indicates that:
a. y causes x to happen
b. x causes y to happen
c. both (a) and (b)
d. there may or may not be any causal relationship between x and y
ANSWER: d
97.
For the values of the coefficient of determination listed below, which one implies the
greatest value of the sum of squares for regression given that the total variation in y is
1800?
a. 0.69
b. 0.96
c. 0.58
d. 0.85
ANSWER: b
98.
When all the actual and predicted values of y are equal, the standard error of estimate will
be:
a. 1.0
b. 1.0
c. 0.0
d. 2.0
ANSWER: c
99.
Which of the following statistics and procedures can be used to determine whether a
linear model should be employed?
a. The standard error of estimate
b. The coefficient of determination
c. The t-test of the slope
d. All of the above
ANSWER: d

113
100.
In testing the hypotheses: H 0 : 1 0 vs. H 1 : 1 0 , the following statistics are

available:
n = 10, b0 1.8 , b1 2.45 , s b = 1.20, and y = 6. The value of the
test statistic is:
a. 2.042
b. 0.306
c. 1.50
d. -0.300
ANSWER: a
1
101.
The standard error of estimate s is given by:

a. SSE/(n 2)
SSE /( n 2)
b.
SSE /( n 2)
c.
d. SSE/ n 2
ANSWER: c
102.
If the standard error of estimate s = 20 and n = 10, then the sum of squares for error,
SSE, is:
a. 400
b. 3200
c. 4000
d. 40000
ANSWER: b
103.
The smallest value that the standard error of estimate s can assume is:
a. 1
b. 0
c. 1
d. 2
ANSWER: b
104.
2
2
If cov(x, y) = 1260, s x 1600 and s y 1225, then the coefficient of determination is:
a. 0.7875
b. 1.0286
c. 0.8100
d. 0.7656
ANSWER: c
105.
The standard error of estimate s is a measure of the:

a. variation of y around the regression line
114
Chapter Seventeen
b. variation of x around the regression line
c. variation of y around the mean y
d. variation of x around the mean x
ANSWER: a
106.
The Pearson coefficient of correlation r equals 1 when there is no:

a. explained variation
b. unexplained variation
c. y-intercept in the model
d. outliers
ANSWER: b
107.
In regression analysis, the coefficient of determination R 2 measures the amount of

variation in y that is:
a. caused by the variation in x
b. explained by the variation in x
c. unexplained by the variation in x
d. None of the above
ANSWER: b
108.
If we are interested in determining whether two variables are linearly related, it is

necessary to:
a. perform the t-test of the slope 1
b. perform the t-test of the coefficient of correlation
c. either (a) or (b) since they are identical
d. calculate the standard error of estimate s
ANSWER: c
109.
In a regression problem the following pairs of (x,y) are given: (3,1), (3,-1), (3,0), (3,-2)
and (3,2). That indicates that the:
a. correlation coefficient is 1
b. correlation coefficient is 0
c. correlation coefficient is 1
d. coefficient of determination is between 1 and 1
ANSWER: b
110.
In a regression problem, if the coefficient of determination is 0.95, this means that:

a. 95% of the y values are positive
b. 95% of the variation in y can be explained by the variation in x
c. 95% of the x values are equal
d. 95% of the variation in x can be explained by the variation in y
ANSWER: b
111.
the p value is 0.256 when testing H o : 0 against the two-sided alternative

115
H1 : 0 . To test H o : 0 against the one-sided alternative H1 : 0 at a significant
level of 0.193, the p value will be equal to
a. 0.128
b. 0.512
c. 0.744
d. 0.872
ANSWER: a
112.
In simple linear regression, which of the following statements indicate no linear

relationship between the variables x and y?
a. Coefficient of determination is 1.0
b. Coefficient of correlation is 0.0
c. Sum of squares for error is 0.0
d. Sum of squares for regression is relatively large
ANSWER: b
113.
If the sum of squared residuals is zero, then the:

a. coefficient of determination must be 1.0
b. coefficient of correlation must be 1.0
c. coefficient of determination must be 0. 0
d. coefficient of correlation must be 0.0
ANSWER: a
114.
In a regression problem, if all the values of the independent variable are equal, then the
coefficient of determination must be:
a. 1.0
b. 0.5
c. 0.0
d. 1.0
ANSWER: c
115.
The standard error of the estimate is a measure of

a. total variation of the y variable
b. the variation around the sample regression line
c. explained variation
d. the variation of the x variable
ANSWER: b
116.
In simple linear regression, the coefficient of correlation r and the least squares estimate
b1 of the population slope 1 :
a. must be equal
116
Chapter Seventeen
b. must have opposite signs
c. must have the same sign
d. may have opposite signs or the same sign
ANSWER: c
117.
The coefficient of determination ( R 2 ) tells us

a. that the coefficient of correlation is larger than 1
b. whether r has any significance
c. that we should not partition the total variation
d. the proportion of total variation in y that is explained by x
ANSWER: d
118.
In performing a regression analysis involving two numerical variables, we are assuming:

a. the variances of x and yare equal
b. the variation around the line of regression is the same for each x value
c. that x and y are independent
d. All of the above
ANSWER: b
119.
Which of the following assumptions concerning the probability distribution of the

random error term is stated incorrectly?
a. The distribution is normal
b. The mean of the distribution is 0
c. The variance of the distribution increases as x increases
d. The errors are independent
ANSWER: c
120.
If the correlation coefficient (r) = 1.00, then

a. The y intercept ( bo ) must equal 0
b. The explained variation equals the unexplained variation
c. There is no unexplained variation
d. There is no explained variation
ANSWER: c
121.
In a simple linear regression problem, r and b1

a. may have opposite signs
b. must have the same sign
c. must have opposite signs
d. must be equal
ANSWER: b
122.
the p value is 0.256 when testing H o : 0 against a two-sided alternative H1 : 0 .
To test H o : 0 against the one-sided alternative H1 : 0 at a significance level of
0.193, the p - value will be equal to

117
a. 0.128
b. 0.512
c. 0.744
d. 0.872
ANSWER: d
123.
Which of the following in not a required condition for the error variable in the simple
linear regression model?
a. The probability distribution of is normal.
b. The mean of the probability distribution of is zero.
c. The standard deviation of is a constant no matter what the value of x.
d. The values of are auto correlated.
ANSWER: d
124.
Testing for existence of correlation is equivalent to

a. testing for the existence of the slope ( 1 )
b. testing for the existence of the Y intercept ( o )
c. the confidence interval estimate for predicting Y
d. None of the above
ANSWER: a
125.
The coefficient of determination R 2 measures the amount of:

a. variation in y that is explained by variation in x
b. variation in x that is explained by variation in y
c. variation in y that is unexplained by variation in x
d. variation in x that is unexplained by variation in y
ANSWER: a
126.
If the coefficient of correlation is 0.90, then the percentage of the variation in the
dependent variable y that is explained by the variation in the independent variable x is:
a. 90%
b. 81%
c. 0.90%
d. 0.81%
ANSWER: b
127.
If a researcher wanted to find out if alcohol consumptions and grade point average on a 4
point scale are linearly related, he would perform a
a. 2 test for the difference in two proportions
b. 2 test for independence
c. a z test for the difference in two proportions
118
Chapter Seventeen
d. a t test for no linear relationship between the two variables
ANSWER: d

119
128.
If the value of the sum of squares for error SSE equals zero, then the coefficient of
determination must equal zero.
ANSWER: F
129.
When the actual values y of a dependent variable and the corresponding predicted values
y are the same, the standard error of the estimate will be 1.0.
ANSWER: F
130.
The value of the sum of squares for regression SSR can never be smaller than 0.0.
ANSWER: T
131.
The value of the sum of squares for regression SSR can never be smaller than 1.
ANSWER: F
132.
If all the values of an independent variable x are equal, then regressing a dependent
variable y on x will result in a coefficient of determination of zero.
ANSWER: T
133.
In a simple linear regression model, testing whether the slope 1 of the population
regression line could be zero is the same as testing whether or not the population
coefficient of correlation equals zero.
ANSWER: T
134.
y are the same, the standard error of estimate s will be 0.0.
ANSWER: T
135.
If there is no linear relationship between two variables x and y , the coefficient of

determination must be 1.0.
ANSWER: F
136.
The value of the sum of squares for regression SSR can never be larger than the value of
sum of squares for error SSE.
ANSWER: F
137.
y are the same, the standard error of estimate s will be -1.0.
ANSWER: F
138.
In a simple linear regression problem, the least squares line is y = -3.75 + 1.25 x , and
the coefficient of determination is 0.81. The coefficient of correlation must be 0.90.
ANSWER: F
120
Chapter Seventeen
139.
In simple linear regression, the divisor of the standard error of estimate s is n 2.

ANSWER: T
140.
In a regression problem the following pairs of (x, y) are given: (4,-2), (4,-1), (4,0), (4,1)
and (4,2). That indicates that the coefficient of correlation is 1.
ANSWER: F
141.
The value of the sum of squares for regression SSR can never be larger than the value of
total sum of squares SST.
ANSWER: T
142.
In regression analysis, if the coefficient of determination is 1.0, then the coefficient of

correlation must be 1.0.
ANSWER: F
143.
Correlation analysis is used to determine the strength of the relationship between an

independent variable x and dependent variable y.
ANSWER: T
144.
If the coefficient of correlation is 0.81, then the percentage of the variation in y that is
explained by the regression line is 81%.
ANSWER: F
145.
If all the points in a scatter diagram lie on the least squares regression line, then the
coefficient of correlation must be 1.0.
ANSWER: F
146.
If the standard error of estimate s = 20 and n = 8, then the sum of squares for error SSE
is 2,400.
ANSWER: T
147.
The probability distribution of the error variable is normal, with mean E( ) = 0, and
standard deviation =1.
ANSWER: F
148.
In a simple linear regression problem, if the coefficient of determination is 0.95, this

means that 95% of the variation in the independent variable x can be explained by
regression line.
ANSWER: F
149.
2
2
Given that cov(x, y) = 10, s y = 15, sx = 8, and n = 12, the value of the standard error of
estimate s is 2.75.
ANSWER: F

121
150.
If the error variable is normally distributed, the test statistic for testing H 0 : 1 0 is
Student t distributed with n 2 degrees of freedom.
ANSWER: T
151.
2
2
Given that cov(x, y) = 8.5, s y = 8, and sx = 10, then the value of the coefficient of
determination is 0.95.
ANSWER: F
152.
The coefficient of determination is the coefficient of correlation squared. That is, R 2 r 2

ANSWER: T
153.
Given that SSE = 60 and SSR = 540, the proportion of the variation in y that is explained
by the variation in x is 0.90.
ANSWER: T
154.
Given that SSE = 84 and SSR = 358.12, the coefficient of correlation (also called the
Pearson coefficient of correlation) must be 0.90.
ANSWER: F
155.
Except for the values r = -1, 0, and 1, we cannot be specific in our interpretation of the
coefficient of correlation r. However, when we square it we produce a more meaningful
statistic.
ANSWER: T
156.
A zero population correlation coefficient between a pair of random variables means that
there is no linear relationship between the random variables.
ANSWER: T
157.
2
2
Given that cov(x, y) = 8, s y = 14, sx = 10, and n = 6, the value of the sum of squares for
error SSE is 38.
ANSWER: T
158.
A store manager gives a pre-employment examination to new employees. The test is

scored from 1 to 100. He has data on their sales at the end of one year measured in
dollars. He wants to know if there is any linear relationship between pre-employment
examination score and sales. An appropriate test to use is the t test on the population
correlation coefficient.
ANSWER: T
122
Chapter Seventeen

Narrative: Car Speed and Gas Mileage
An economist wanted to analyze the relationship between the speed of a car (x) and its gas
mileage (y). As an experiment a car is operated at several different speeds and for each speed the
gas mileage is measured. These data are shown below.
Speed
Gas Mileage
159.
25
40
35
39
45
37
50
33
60
30
65
27
70
25
{Car Speed and Gas Mileage Narrative} Calculate the standard error of estimate, and
describe what this statistic tells you about the regression line.
ANSWER:
s 1.448; the models fit to these data is good.
160.
{Car Speed and Gas Mileage Narrative} Do these data provide sufficient evidence at the
5% significance level to infer that a linear relationship exists between higher speeds and
lower gas mileage?
ANSWER:
H 0 : 0 vs. H 1 : 0
Rejection region: | t | > t0.025,10 2.228
Test statistic: t = -9.754
Conclusion: Reject the null hypothesis. Yes, these data provide sufficient evidence at the
5% significance level to infer that a linear relationship exists between higher speeds and
lower gas mileage.
161.
{Car Speed and Gas Mileage Narrative} Predict with 99% confidence the gas mileage of
a car traveling 55 mph.
ANSWER:
31.236 6.284. Thus, LCL = 24.952, and UCL = 37.52
162.
{Car Speed and Gas Mileage Narrative} Calculate the Pearson coefficient of correlation.
ANSWER:
r = -0.975
163.
{Car Speed and Gas Mileage Narrative} What does the coefficient of correlation tell you
about the direction and strength of the relationship between the two variables?
ANSWER:
There is a very strong negative linear relationship between car speed and gas mileage.

123
164.
{Car Speed and Gas Mileage} Calculate the coefficient of determination and interpret its
value.
ANSWER:
R 2 = 0.95. This means that 95% of the total variation in gas mileage can be explained by
the speed of the car.
165.
The following 10 observations of variables x and y were collected.

x
y
1
25
2
22
3
21
4
19
5
14
6
15
7
12
8
10
9
6
10
2
a. Calculate the standard error of estimate.

b. Test to determine if there is enough evidence at the 5% significance level to indicate
that x and y are negatively linearly related.
c. Calculate the coefficient of correlation, and describe what this statistic tells you about
the regression line.
ANSWER:
a. s 1.322
b. H 0 : 1 0 vs. H 1 : 1 0
Conclusion: Reject the null hypothesis. Yes, there is enough evidence at the 5%
significance level to indicate that x and y are negatively linearly related.
c. r = -0.9854. This indicates a very strong negative linear relationship between the two
variables.
166.
Consider the following data values of variables x and y.

x
y
2
7
4
11
6
17
8
21
10
27
13
36
a. Calculate the coefficient of determination, and describe what this statistic tells you
about the relationship between the two variables.
b. Calculate the Pearson coefficient of correlation. What sign does it have? Why?
c. What does the coefficient of correlation calculated Tell you about the direction and
strength of the relationship between the two variables?
ANSWER:
a. R 2 0.995. This means that 99.5% of the variation in the dependent variable y is
explained by the variation in the independent variable x.
b. r = 0.9975. It is positive since the slope of the regression line is positive.
c. There is a very strong (almost perfect) positive linear relationship between the two
variables.
124
Chapter Seventeen

Narrative: Sunshine and Skin Cancer
A medical statistician wanted to examine the relationship between the amount of sunshine (x)
and incidence of skin cancer (y). As an experiment he found the number of skin cancers detected
per 100,000 of population and the average daily sunshine in eight counties around the country.
These data are shown below.
167.
5
7
7
11
6
9
7
12
8
15
6
10
4
7
3
5
{Sunshine and Skin Cancer Narrative} Calculate the standard error of estimate, and
ANSWER:
168.
{Sunshine and Skin Cancer Narrative} Can we conclude at the 1% significance level that
there is a linear relationship between sunshine and skin cancer?
ANSWER:
H 0 : 0 vs. H 1 : 0
Test statistic: t = 8.485
Conclusion: Reject the null hypothesis. Yes, we conclude at the 1% significance level that
there is a linear relationship between sunshine and skin cancer.
169.
{Sunshine and Skin Cancer Narrative} Calculate the coefficient of determination and
interpret it.
ANSWER:
R 2 0.9231. This means that 92.31% of the variation in the incidence of skin cancer is
explained by the variation in the amount of sunshine.
170.
{Sunshine and Skin Cancer Narrative} Calculate the Pearson coefficient. What sign does
it have? Why?
ANSWER:
R = 0.9608. It is positive since the slope of the regression line ( b1 = 1.846) is positive.
171.
{Sunshine and Skin Cancer Narrative} What does the coefficient of correlation calculated
Tell you about the direction and strength of the relationship between the two variables?
ANSWER:
There is a very strong (almost perfect) positive linear relationship between the two
variables.

125
Narrative: Sales and Experience
Salesperson
1
2
3
4
5
6
7
8
9
10
172.
Years of Experience
0
2
10
3
8
5
12
7
20
15
Sales
7
9
20
15
18
14
20
17
30
25
{Sales and Experience Narrative} Determine the standard error of estimate and describe
what this statistic tells you about the regression line.
ANSWER:
s 1.5724; the models fit is good.
173.
(Sales and Experience Narrative} Determine the coefficient of determination and discuss
what its value tells you about the two variables.
ANSWER:
R 2 0.9536, which means that 95.36% of the variation in sales is explained by the
variation in years of experience of the salesperson.
174.
{Sales and Experience Narrative} Calculate the Pearson correlation coefficient. What
sign does it have? Why?
ANSWER:
r 0.9765. It has a positive sign since the slope of the regression line ( b1 = 1.0817) is
positive.
126
175.
Chapter Seventeen
{Sales and Experience Narrative} Conduct a test of the population coefficient of
correlation to determine at the 5% significance level whether a linear relationship exists
between years of experience and sales.
ANSWER:
H 0 : 0 vs. H 1 : 0
Conclusion: Reject the null hypothesis. Yes, a linear relationship exists between years of
experience and sales.
176.
{Sales and Experience Narrative} Conduct a test of the population slope to determine at
the 5% significance level whether a linear relationship exists between years of experience
and sales.
ANSWER:
H 0 : 1 0 vs. H 1 : 1 0

experience and sales.
177.
{Sales and Experience Narrative} Do the tests of and 1 in the previous two questions
provide the same results? Explain.
ANSWER:
Yes; both tests have the same value of the test statistic, the same rejection region, and of
course the same conclusion. This is not a coincidence; the two tests are identical.

below.
Education
Income
178.
16
58
11
40
15
55
8
35
12
43
10
41
13
52
14
49
{Income and Education Narrative} Determine the standard error of estimate and describe
what this statistic tells you about the regression line.
ANSWER:

127
179.
{Income and Education Narrative} Determine the coefficient of determination and

discuss what its value tells you about the two variables.
ANSWER:
R 2 0.9223, which means that 92.03% of the variation in income is explained by the
variation in years of education.
180.
{Income and Education Narrative} Calculate the Pearson correlation coefficient. What
ANSWER:
positive.
181.
{Income and Education Narrative} Conduct a test of the population coefficient of

between years of education and income.
ANSWER:
H 0 : 0 vs. H 1 : 0
education and income.
182.
{Income and Education Narrative} Conduct a test of the population slope to determine at
the 5% significance level whether a linear relationship exists between years of education
and income.
ANSWER:
H 0 : 1 0 , H 1 : 1 0
education and income.
183.
{Income and Education Narrative} Do the tests of and 1 in the previous two provide
the same results? Explain.
ANSWER:
128
Chapter Seventeen

Contestant
1
2
3
4
5
6
7
8
184.
Years of Education
11
15
12
16
11
16
13
14
Winnings
750
400
600
350
800
300
650
400
{Game Winnings and Education Narrative} Determine the standard error of estimate and
ANSWER:
185.
{Game Winnings and Education Narrative} Determine the coefficient of determination

and discuss what its value tells you about the two variables.
ANSWER:
R 2 0.9185, which means that 91.85% of the variation in TV game shows winnings is
explained by the variation in years of education.
186.
{Game Winnings and Education Narrative} Calculate the Pearson correlation coefficient.
What sign does it have? Why?
ANSWER:
r -0.9584. It has a negative sign since the slope of the regression line ( b1 = -89.1667)
is negative.
187.
{Game Winnings and Education Narrative} Conduct a test of the population coefficient
of correlation to determine at the 5% significance level whether a linear relationship
exists between years of education and TV game shows winnings.
ANSWER:
H 0 : 0 vs. H 1 : 0

129
education and TV game shows winnings.
188.
{Game Winnings and Education Narrative} Conduct a test of the population slope to
determine at the 5% significance level whether a linear relationship exists between years
of education and TV game shows winnings.
ANSWER:
H 0 : 1 0 vs. H 1 : 1 0

education and TV game shows winnings.
189.
{Game Winnings and Education Narrative} Do the tests and 1 in the previous two
questions provide the same results? Explain.
ANSWER:
Yes. This is not a coincidence; the two tests are identical.

Movie
1
2
3
4
5
6
7
8
9
10
190.
Cost of Two Highest

Paid Performers
5.3
7.2
1.3
1.8
3.5
2.6
8.0
2.4
4.5
6.7
Gross Revenue
48
65
18
20
31
26
73
23
39
58
{Movie Revenues Narrative} Determine the standard error of estimate and describe what
this statistic tells you about the regression line.
130
Chapter Seventeen
ANSWER:
s 2.0247; the models fit to these is good.
191.
{Movie Revenues Narrative} Determine the coefficient of determination and discuss

ANSWER:
R 2 0.9908, which means that 99.08% of the variation in gross revenue is explained by
the variation in payment to the highest performers.
192.
{Movie Revenues Narrative} Calculate the Pearson correlation coefficient. What sign
does it have? Why?
ANSWER:
positive.
193.
{Movie Revenues Narrative} Conduct a test of the population coefficient of correlation to

determine at the 5% significance level whether a linear relationship exists between
payment to the two highest-paid performers and gross revenue.
ANSWER:
H 0 : 0 vs. H 1 : 0
Conclusion: Reject the null hypothesis. Yes, a linear relationship exists between payment
to the two highest-paid performers and gross revenue.
194.
{Movie Revenues Narrative} Conduct a test of the population slope to determine at the
5% significance level whether a linear relationship exists between payment to the two
highest-paid performers and gross revenue.
ANSWER:
H 0 : 1 0 vs. H 1 : 1 0

Conclusion: Reject the null hypothesis. Yes, a linear relationship exists between payment
to the two highest-paid performers and gross revenue.
195.
{Movie Revenues Narrative} Do the and 1 tests in the previous questions provide the
same results? Explain.
ANSWER:

131
FOR QUESTIONS 196 AND 197, USE THE FOLLOWING NARRATIVE:
Narrative: Cost of Books
Book
1
2
3
4
5
6
7
8
9
10
11
12
196.
Number of Pages
844
727
360
915
295
706
410
905
1058
865
677
912
Selling Price ($)

55
50
35
60
30
50
40
53
65
54
42
58
{Cost of Books Narrative} Determine the coefficient of determination and discuss what
its value tells you.
ANSWER:
R 2 0.9378, which means that 93.78% of the variation in the price of books is explained
by the variation in the number of pages.
197.
{Cost of Books Narrative} Can we infer at the 5% significance level that the editor is
correct?
ANSWER:
H 0 : 1 0 vs. H 1 : 1 0

Conclusion: Reject the null hypothesis. Yes, we can infer at the 5% significance level that
the editor is correct
Narrative: Automobile Accidents and Precipitation
132
Day
1
2
3
4
5
6
7
8
9
10
198.
Chapter Seventeen
Precipitation
0.05
0.12
0.05
0.08
0.10
0.35
0.15
0.30
0.10
0.20
Number of Accidents
5
6
2
4
8
14
7
13
7
10
{Automobile Accidents and Precipitation Narrative} Calculate the standard error of

estimate, and describe what this statistic tells you about the regression line.
ANSWER:
s 1.3207; the models fit to these is good.
199.
{Automobile Accidents and Precipitation Narrative} Determine the coefficient of

determination and discuss what its value tells you about the two variables.
ANSWER:
R 2 0.893, which means that 89.3% of the variation in the number of accidents is
explained by the variation in the amount of precipitation.
200.
{Automobile Accidents and Precipitation Narrative} Conduct a test of the population

slope to determine whether these data allow us to conclude at the 10% significance level
that the amount of precipitation and the number of accidents are linearly related?
ANSWER:
H 0 : 1 0 vs. H 1 : 1 0

Conclusion: Reject the null hypothesis. Yes, these data allow us to conclude at the 10%
significance level that the amount of precipitation and the number of accidents are
linearly related
201.
{Automobile Accidents and Precipitation Narrative} Conduct a test of the population

coefficient of correlation to determine whether these data allow us to conclude at the 10%
linearly related.
ANSWER:

133
H o : 0 vs. H1 : 0

Conclusion: Reject the null hypothesis. Yes, these data allow us to conclude at the 10%
linearly related.
202.
{Automobile Accidents and Precipitation Narrative} Do the 1 and tests in the

previous two questions provide the same results? Explain
ANSWER:
Yes, the two tests are identical to each other.

Age
Number of Concerts
62
6
57
5
40
4
49
3
67
5
54
5
43
2
65
6
54
3
41
1
Age
Number of Concerts
44
3
48
2
55
4
60
5
59
4
63
5
69
4
40
2
38
1
52
3

SUMMARY OUTPUT
Multiple R
0.80203
R Square
0.64326
Adjusted R Square
0.62344
Standard Error
0.93965
Observations
20
Age
Mean
Standard Error
Standard Deviation
Sample Variance
Count
53
2.1849
9.7711
95.4737
20
Concerts
Mean
Standard Error
Standard Deviation
Sample Variance
Count
MS
28.65711
0.88294
F
32.45653
Significance F
2.1082E-05
t Stat
-2.53491
5.69706
P-value
0.02074
0.00002
3.65
0.3424
1.5313
2.3447
20

ANOVA
Regression
Residual
Total
Intercept
Age
df
1
18
19
SS
28.65711
15.89289
44.55

-3.01152
1.18802
0.12569
0.02206
203.
{Willie Nelson Concert

Lower 95%
-5.50746
0.07934
Upper 95%
-0.5156
0.1720
Narrative} Determine the standard error of estimate and describe what this statistic tells
you about the models fit.
134
Chapter Seventeen
ANSWER:
s 0.9396, and since the sample mean y = 3.65, we would have to admit that the
standard error of estimate is not very small. On the other hand, it is not a large number
either. Because there is no predefined upper limit on s , it is difficult in this problem to
assess the model in this way. However, using other criteria, it seems that the models fit to
these data is reasonable.
204.
{Willie Nelson Concert Narrative} Determine the coefficient of determination and

discuss what its value tells you about the two variables.
ANSWER:
R 2 0.64326, which means that 64.326% of the variation in number of concerts attended
is explained by the variation in age of the attendees.
205.
{Willie Nelson Concert Narrative} Calculate the Pearson correlation coefficient. What
ANSWER:
r 0.80204. It has a positive sign since the slope of the regression line, b1 , is positive.
206.
{Willie Nelson Concert Narrative} Conduct a test of the population coefficient of

between age and number of concerts attended.
ANSWER:
H 0 : 0 vs. H 1 : 0
Test statistic: t r (n 2) /(1 r 2 ) = 5.6971
Conclusion: Reject the null hypothesis. Yes
207.
{Willie Nelson Concert Narrative} Conduct a test of the population slope to determine at
the 5% significance level whether a linear relationship exists between age and number of
concerts attended.
ANSWER:
H 0 : 1 0 vs. H 1 : 1 0

Conclusion: Reject the null hypothesis. Yes, we can infer that at the 5% significance level
that a linear relationship exists between age and number of concerts attended.
208.
{Willie Nelson Concert Narrative} Do the and 1 tests in the previous two questions

135
ANSWER:
Oil degrees API
27.0
28.5
30.8
31.3
31.9
34.5
34.0
34.7
37.0
41.0
41.0
38.8
39.3

12.02
12.04
12.32
12.27
12.49
12.70
12.80
13.00
13.00
13.17
13.19
13.22
13.27
Regression Analysis
Predictor
Coef
StDev
T
Constant
9.4349
0.2867
32.91
Degrees
0.095235 0.008220 11.59
S = 0.1314
R-Sq = 92.46%
Source
Regression
Residual Error
Total
209.
DF
1
11
12
A partial statistical software output follows:

Variable
N
Mean
Mean
Degrees
13
34.60
Price
13 12.730
StDev
SE
4.613
0.457
1.280
0.127
Covariances
Degrees
Price
Degrees
21.281667
2.026750
Price
0.208833
P
0.000
0.000
R-Sq(adj) = 91.7%
SS
2.3162
0.1898
2.5060
MS
2.3162
0.0173
F
134.24
P
0.000
{Oil Quality and Price Narrative} Determine the standard error of estimate and describe
what this statistic tells you.
ANSWER:
s 0.1314. Since the sample mean y = 12.73, the standard error of estimate is judged to
be small, and we may say that the model fits the data well.
136
210.
Chapter Seventeen
{Oil Quality and Price Narrative} Determine the coefficient of determination and discuss
ANSWER:
R 2 0.9246, which means that 92.46% of the variation in the oil price per barrel is
explained by the variation in the API degrees.
211.
{Oil Quality and Price Narrative} Calculate the Pearson correlation coefficient. What
ANSWER:
r 0.9616. It has a positive sign since the slope of the regression line, b1 , is positive.
212.
{Oil Quality and Price Narrative} Conduct a test of the population coefficient of
between the quality of oil and price per barrel.
ANSWER:
H 0 : 0 vs. H 1 : 0
Test statistic: t r (n 2) /(1 r 2 ) = 11.61
Conclusion: Reject the null hypothesis. Yes, we can infer that at the 5% significance level
that a linear relationship exists between the quality of oil and price per barrel.
213.
{Oil Quality and Price Narrative} Conduct a test of the population slope to determine at
the 5% significance level whether a linear relationship exists between the quality of oil
and price per barrel.
ANSWER:
H 0 : 1 0 vs. H 1 : 1 0

Test statistic: t = 11.59 (from Minitab output)
Conclusion: Reject the null hypothesis. Yes, we can infer at the 5% significance level that
a linear relationship exists between the quality of oil and price per barrel.
214.
{Oil Quality and Price Narrative} Do the and 1 tests in the previous two questions
ANSWER:

137
Yes; both tests have the same value of the test statistic (the small difference between
11.61 and 11.59 is due to rounding in Minitab output), the same rejection region, and of
138
Chapter Seventeen
SECTION 6
215.
In order to estimate with 95% confidence the expected value of y for a given value of x
in a simple linear regression problem, a random sample of 10 observations is taken.
Which of the following t-table values listed below would be used?
a. 2.228
b. 2.306
c. 1.860
d. 1.812
ANSWER: b
216.
Given a specific value of x and confidence level, which of the following statements is
correct?
a. The confidence interval estimate of the expected value of y can be calculated but the
prediction interval of y for the given value of x cannot be calculated.
b. The confidence interval estimate of the expected value of y will be wider than the
prediction interval.
c. The prediction interval of y for the given value of x can be calculated but the
confidence interval estimate of the expected value of y cannot be calculated.
d. The confidence interval estimate of the expected value of y will be narrower than the
prediction interval.
ANSWER: d
217.
In order to predict with 90% confidence the expected value of y for a given value of x in a
simple linear regression problem, a random sample of 10 observations is taken. Which of
the following t-table values listed below would be used?
a. 2.228
b. 2.306
c. 1.860
d. 1.812
ANSWER: c
218.
The confidence interval estimate of the expected value of y for a given value y x,
compared to the prediction interval of y for the same given value of x and confidence
level, will be
a. wider
b. narrower
c. the same
d. impossible to know
ANSWER: b

139
219.
a. 1.860
b. 2.306
c. 2.896
d. 3.355
ANSWER: d
220.
The width of the confidence interval estimate for the predicted value of y depends on
a. the standard error of the estimate
b. the value of x for which the prediction is being made
c. the sample size
d. All of the above
ANSWER: d
221.
a. 1.350
b. 1.771
c. 2.160
d. 2.650
ANSWER: a
222.
a. 1.350
b. 1.771
c. 2.160
d. 2.650
ANSWER: d
140
Chapter Seventeen
223.
In developing a 95% confidence interval for the expected value of y from a simple linear
regression problem involving a sample of size 10, the appropriate table value would be
1.86.
ANSWER: F
224.
In developing a 80% prediction interval for the particular value of y from a simple linear
1.372
ANSWER: T
225.
In developing 90% prediction interval for the particular value of y from a simple linear
2.179
ANSWER: F
226.
In order to predict with 95% confidence a particular value of y for a given value of x in
a simple linear regression problem, a random sample of 20 observations is taken. The
appropriate table value that would be used is 2.101.
ANSWER: T
227.
The confidence interval estimate of the expected value of y will be narrower than the
prediction interval for the same given value of x and confidence level. This is because
there is less error in estimating a mean value as opposed to predicting an individual value.
ANSWER: T
228.
The confidence interval estimate of the expected value of y will be wider than the
prediction interval for the same given value of x and confidence level. This is because
there is more error in estimating a mean value as opposed to predicting an individual
value.
ANSWER: F
229.
1.761.
ANSWER: F
230.
2.807
ANSWER: T
231.
The prediction interval for a particular value of y is always wider than the confidence
interval for mean value of y, given the same data set, x value, and confidence level.
ANSWER: T

141
BASIC TECHNIQUES & APPLIED QUESTIONS
232.
A medical statistician wanted to examine the relationship between the amount of sunshine
(x) and incidence of skin cancer (y). As an experiment he found the number of skin
cancers detected per 100,000 of population and the average daily sunshine in eight
counties around the country. These data are shown below.
5
7
7
11
6
9
7
12
8
15
6
10
4
7
3
5
Predict with 95% confidence the skin cancers per 100,000 in a county with a daily
average of 6.5 hours of sunshine.
ANSWER:
10.884 2.525. Thus, LCL= 8.359, and UCL = 13.409
Salesperson
1
2
3
4
5
6
7
8
9
10
233.
Years of Experience
0
2
10
3
8
5
12
7
20
15
Sales
7
9
20
15
18
14
20
17
30
25
{Sales and Experience Narrative} Predict with 95% confidence the monthly sales of a
salesperson with 10 years of experience.
ANSWER:
19.447 3.819. Thus LCL = 15.628 (in $1000s), and UCL = 23.266 (in $1000s)
234.
{Sales and Experience Narrative} Estimate with 95% confidence the average monthly
sales of all salespersons with 10 years of experience.
ANSWER:
19.447 1.199. Thus LCL = 18.248 (in $1000s), and UCL = 20.646 (in $1000s)
142
235.
Chapter Seventeen
{Sales and Experience Narrative} Which interval in the previous two questions is
narrower: the confidence interval estimate of the expected value of y or the prediction
interval for the same given value of x (10 years) and same confidence level? Why?
ANSWER:
The confidence interval estimate of the expected value of y is narrower than the
prediction interval for the same given value of x (10 years) and some confidence level.
This is because there is less error in estimating a mean value as opposed to predicting an
individual value.

below.
Education
Income
236.
16
58
11
40
15
55
8
35
12
43
10
41
13
52
14
49
{Income and Education Narrative} Predict with 95% confidence the income of an
individual with 10 years of education.
ANSWER:
39.715 2.710. Thus, LCL = 37.005 (in $1000s), and UCL = 42.425 (in $1000s)
237.
{Income and Education Narrative} Estimate with 95% confidence the average income of
all individuals with 10 years of education.
ANSWER:
39.715 1.188. Thus, LCL = 38.527 (in $1000s), and UCL = 40.903 (in $1000s)
238.
{Income and Education Narrative} Which interval in the previous two questions is
ANSWER:
individual value.

143
Contestant
1
2
3
4
5
6
7
8
239.
Years of Education
11
15
12
16
11
16
13
14
Winnings
750
400
600
350
800
300
650
400
{Movie Revenues Narrative} Predict with 95% the winnings of a contestant who has 15
years of education.
ANSWER:
397.500 159.213. Thus, LCL = $238.287, and UCL = $556.713
240.
{Movie Revenues Narrative} Predict with 95% the winnings of a contestant who has 10
years of education.
ANSWER:
397.500 179.971. Thus, LCL = $217.529, and UCL = $577.471
241.
{Movie Revenues Narrative} Estimate with 95% confidence the average winnings of all
contestants who have 15 years of education.
ANSWER:
397.500 64.998. Thus, LCL = $332.502, and UCL = $462.498
242.
{Movie Revenues Narrative} Estimate with 95% confidence the average winnings of all
contestants who have 10 years of education.
ANSWER:
397.500 106.141. Thus, LCL = $291.359, and UCL = $503.641
144
Chapter Seventeen

Movie
1
2
3
4
5
6
7
8
9
10
243.
Cost of Two Highest

Paid Performers
5.3
7.2
1.3
1.8
3.5
2.6
8.0
2.4
4.5
6.7
Gross Revenue
48
65
18
20
31
26
73
23
39
58
{Movie Revenues Narrative} Predict with 95% confidence the gross revenue of a movie
whose top two stars earn $5.0 million.
ANSWER:
45.65 4.916. Thus, LCL = 40.734 (in $1,000,000s), and UCL = 50.566 (in
$1,000,000s)
244.
{Movie Revenues Narrative} Estimate with 95% confidence the average gross revenue of
a movie whose top two stars earn $5.0 million.
ANSWER:
45.65 1.54. Thus, LCL= 44.11 (in $1,000,000s), and UCL = 47.19 (in $1,000,000s)
245.
{Movie Revenues Narrative} Which interval in the previous two questions is narrower:
the confidence interval estimate of the expected value of y or the prediction interval for
the same given value of x (10 years) and same confidence level? Why?
ANSWER:
individual value.

Narrative: Cost of Books

145
Book
1
2
3
4
5
6
7
8
9
10
11
12
246.
Number of Pages
844
727
360
915
295
706
410
905
1058
865
677
912
Selling Price ($)

55
50
35
60
30
50
40
53
65
54
42
58
{Cost of Books Narrative} Predict with 90% confidence the selling price of a book with
900 pages.
ANSWER:
56.647 5.311. Thus, LCL = $51.336, and UCL = $61.958
247.
{Cost of Books Narrative} Estimate with 90% confidence the average selling price of all
books with 900 pages.
ANSWER:
56.647 1.803. Thus, LCL = $54.844, and UCL = $58.450
248.
{Cost of Books Narrative} Which interval in the previous two questions is narrower: the
confidence interval estimate of the expected value of y or the prediction interval for the
same given value of x (10 years) and same confidence level? Why?
ANSWER:
individual value.

Narrative: Automobile Accidents and Precipitation
146
Chapter Seventeen
Day
1
2
3
4
5
6
7
8
9
10
249.
Precipitation
0.05
0.12
0.05
0.08
0.10
0.35
0.15
0.30
0.10
0.20
Number of Accidents
5
6
2
4
8
14
7
13
7
10
{Automobile Accidents and Precipitation Narrative} Predict with 95% confidence the
number of accidents that occur when there is 0.40 inches of rain.
ANSWER:
16.316 4.032. Thus, LCL = 12.284, and UCL = 20.348
250.
{Automobile Accidents and Precipitation Narrative} Estimate with 95% confidence the
average daily number of accidents when the daily precipitation is 0.25 inches.
ANSWER:
11.086 1.377. Thus, LCL = 9.709, and UCL = 12.463
251.
{Automobile Accidents and Precipitation Narrative} Which interval in the previous two
questions is narrower: the confidence interval estimate of the expected value of y or the
prediction interval for the same given value of x (10 years) and same confidence level?
Why?
ANSWER:
individual value.


147
Age
Number of Concerts
62
6
57
5
40
4
49
3
67
5
54
5
43
2
65
6
54
3
41
1
Age
Number of Concerts
44
3
48
2
55
4
60
5
59
4
63
5
69
4
40
2
38
1
52
3

SUMMARY OUTPUT
Multiple R
0.80203
R Square
0.64326
Adjusted R Square
0.62344
Standard Error
0.93965
Observations
20
Age
Mean
Standard Error
Standard Deviation
Sample Variance
Count
53
2.1849
9.7711
95.4737
20
Concerts
Mean
Standard Error
Standard Deviation
Sample Variance
Count
MS
28.65711
0.88294
F
32.45653
Significance F
2.1082E-05
t Stat
-2.53491
5.69706
P-value
0.02074
0.00002
Lower 95%
-5.50746
0.07934
3.65
0.3424
1.5313
2.3447
20

ANOVA
df
1
18
19
Regression
Residual
Total
Intercept
Age
252.
SS
28.65711
15.89289
44.55

-3.01152
1.18802
0.12569
0.02206
Upper 95%
-0.5156
0.1720
{Willie Nelson Concert Narrative} Predict with 95% confidence the number of concerts
attended by a 45 years-old individual.
ANSWER:
2.645 2.057. Thus, LCL = 0.588, and UCL = 4.702
253.
{Willie Nelson Concert Narrative} Estimate with 95% confidence the average number of
concerts attended by all 45 year-old individuals.
ANSWER:
2.645 0.577. Thus, LCL = 2.068, and UCL = 3.222
254.
{Willie Nelson Concert Narrative} Which interval in the previous two questions is
ANSWER:
148
Chapter Seventeen
individual value.

Oil degrees API
27.0
28.5
30.8
31.3
31.9
34.5
34.0
34.7
37.0
41.0
41.0
38.8
39.3

12.02
12.04
12.32
12.27
12.49
12.70
12.80
13.00
13.00
13.17
13.19
13.22
13.27

Variable
N
Mean
Mean
Degrees
13
34.60
Price
13 12.730
Coef
StDev
T
9.4349
0.2867
32.91
0.095235 0.008220 11.59
S = 0.1314
R-Sq = 92.46%
Degrees
Price
Degrees
21.281667
2.026750
P
0.000
0.000
R-Sq(adj) = 91.7%
Source
Regression
Residual Error
Total
DF
1
11
12
SS
2.3162
0.1898
2.5060
4.613
0.457
Covariances
Regression Analysis
Predictor
Constant
Degrees
StDev
MS
2.3162
0.0173
F
134.24
P
0.000
Price
0.208833
SE
1.280
0.127

149
255.
{Oil Quality and Price Narrative} Predict with 95% confidence the oil price per barrel for
an API degree of 35.
ANSWER:
12.768 (2.201)(0.1314)(1.038) = 12.768 0.300 . Thus, LCL = 12.468, and UCL =
13.068
256.
{Oil Quality and Price Narrative} Estimate with 95% confidence the average oil price per
barrel for an API degree of 35.
ANSWER:
12.768 (2.201)(0.1314)(0.2785) = 12.768 0.081. Thus, LCL = 12.687, and UCL =
12.849
257.
{Oil Quality and Price Narrative} Which interval in the previous two questions is
ANSWER:
individual value.
150
Chapter Seventeen
SECTION 7
258.
259.
The standardized residual is defined as:

a. residual divided by the standard error of estimate
b. residual multiplied by the square root of the standard error of estimate
c. residual divided by the square of the standard error of estimate
d. residual multiplied by the standard error of estimate
ANSWER: a
2
The least squares method requires that the variance of the error variable is a
constant no matter what the value of x is. When this requirement is violated, the
condition is called:
a. non-independence of
b. homoscedasticity
c. heteroscedasticity
d. influential observation
ANSWER: c
260.
When the variance 2 of the error variable

is, this condition is called:
a. homocausality
b. heteroscedasticity
c. homoscedasticity
d. heterocausality
ANSWER: c
261.
If the plot of the residuals is fan shaped, which assumption of regression analysis if
violated?
a. Normality
b. Homoscedasticity
c. Independence of errors
d. No assumptions are violated, the graph should resemble a fan
ANSWER: b
is a constant no matter what the value of x

151
262.
In regression analysis we use the Spearman rank correlation coefficient to measure and
test to determine whether a relationship exists between the two variables if
a. one or both variables may be ordinal
b. both variables are interval but the normality requirement is not met
c. both (a) and (b)
d. neither (a) nor (b)
ANSWER: c
263.
The sample Spearman rank correlation coefficient, where a and b are the ranks of x and y,
respectively, is given by
a. rs cov a, b / sa / sb
b. rs cov a, b / sa sb
c. rs cov a, b / sa sb
d. rs cov a, b / sa sb
ANSWER: d
152
Chapter Seventeen
264.
2
The variance of the error variable is required to be constant. When this requirement is
satisfied, the condition is called homoscedasticity.
ANSWER: T
265.
2
The variance of the error variable is required to be constant. When this requirement is
violated, the condition is called heteroscedasticity.
ANSWER: T
266.
We standardize residuals in the same way we standardize all variables, by subtracting the
mean and dividing by the variance.
ANSWER: F
267.
An outlier is an observation that is unusually small or unusually large.

ANSWER: T
268.
One method of diagnosing heteroscedasticity is to plot the residuals against the predicted
values of y, then look for a change in the spread of the plotted values.
ANSWER: T
269.
Regardless of the value of x, the standard deviation of the distribution of y values about
the regression line is the same. This assumption of equal standard deviations about the
regression line is called residual analysis.
ANSWER: F
270.
Data that exhibit an autocorrelation effect violate the regression assumption of

independence.
ANSWER: T
271.
When n is greater than 30, the sample Spearman rank correlation coefficient rs is
approximately normally distributed with mean of 0 and standard deviation of 1.
ANSWER: F
272.
Given that n = 37, and the value of sample Spearman rank correlation coefficient rs =
0.35, the value of the test statistic for testing H o : s 0 is z = 2.10
ANSWER: T
273.
Another name for Pearson coefficient of correlation is the Spearman rank correlation
coefficient.
ANSWER: F

153
Salesperson
1
2
3
4
5
6
7
8
9
10
274.
Years of Experience
0
2
10
3
8
5
12
7
20
15
Sales
7
9
20
15
18
14
20
17
30
25
{Sales and Experience Narrative} Use the regression equation y 8.63 1.0817 x to
determine the predicted values of y.
ANSWER:
y : 8.630, 10.793, 19.447, 11.875, 17.284, 14.039, 21.610, 16.202, 30.264, and 24.856
275.
{Sales and Experience Narrative} Use the predicted and actual values of y to calculate the
residuals.
ANSWER:
ri : 1.630, -1.793, 0.553, 3.125, 0.716, -0.039, -1.610, 0.798. 0.264, and 0.144
154
276.
Chapter Seventeen
{Sales and Experience Narrative} Plot the residuals against the predicted values of y.
Does the variance appear to be constant?
ANSWER:
Residuals versus Predicted
4
3
Residuals
2
1
0
-1 0
10
15
20
25
30
35
-2
-3
Predicted Values
It appears that heteroscedasticity is not a problem.

277.
{Sales and Experience Narrative} Compute the standardized residuals.

ANSWER:
1.100, -1.210, 0.373, 2.108, 0.483, -0.026, -1.086, 0.538, -0.178, and 0.097
278.
{Sales and Experience Narrative} Identify possible outliers.

ANSWER:
The point (3, 15) is a possible outlier since its standardized residual 2.108 exceeds 2.0.

below.
Education
Income
279.
16
58
11
40
15
55
8
35
12
43
10
41
13
52
14
49
{Income and Education Narrative} Use the regression equation y 10.6165 2.9098 x to
ANSWER:
y : 57.173, 42.624, 54.263, 33.895, 45.534, 39.714, 48.444, and 51.353

155
280.
{Income and Education Narrative} Use the predicted and actual values of y to calculate
the residuals.
ANSWER:
ri : 0.877, -2.624, 0.737, 1.105, -2.534, 1.286, 3.556, and 2.353.
281.
{Income and Education Narrative} Plot the residuals against the predicted values of y.
Does the variance appear to be constant?
ANSWER:
Residulas
4
2
0
-2 0
10
20
30
40
50
60
70
-4
Predicted Values

282.
{Income and Education Narrative} Compute the standardized residuals.

ANSWER:
0.367, -1.164, 0.327, 0.490, -1.124, 0.570, 1.577, and 1.044
283.
{Income and Education Narrative} Identify possible outliers.

ANSWER:
No outliers exist, since no observation has standard residual whose absolute value
exceeds 2.0.

156
Chapter Seventeen
Contestant
1
2
3
4
5
6
7
8
284.
Years of Education
11
15
12
16
11
16
13
14
{Game
Winnings
and
Winnings
750
400
600
350
800
300
650
400
Education
Narrative}
Use
y 1735 89.1667 x to determine the predicted values of y.
the
regression
equation
ANSWER:
y : 754.167, 397.500, 665.000, 308.333, 754.167, 308.333, 575.833, and 486.667
285.
{Game Winnings and Education Narrative} Use the predicted and actual values of y to
calculate the residuals.
ANSWER:
ri : 4.167, 2.500, -65.000, 41.667, 45.833, -8.333, 74.167, and 86.667
{Game Winnings and Education Narrative} Plot the residuals against the predicted values
y . Does the variance appear to be constant.
ANSWER:
Residuals
286.
100
75
50
25
0
-25 0
-50
-75
-100
100
200
300
400
500
Predicted Values
The variance appears to be constant.
600
700
800

157
287.
{Game Winnings and Education Narrative} Compute the standardized residuals.

ANSWER:
The standardized residuals are: 0.076, 0.045, -1.182, 0.758, 0.833, -0.152, 1.349, and
1.576.
288.
{Game Winnings and Education Narrative} Identify possible outliers.

ANSWER:
No outliers exist, since no observation has standard residual whose absolute value
exceeds 2.0.

Movie
1
2
3
4
5
6
7
8
9
10
289.
Cost of Two Highest

Paid Performers
5.3
7.2
1.3
1.8
3.5
2.6
8.0
2.4
4.5
6.7
Gross Revenue
48
65
18
20
31
26
73
23
39
58
{Movie Revenues Narrative} Use the regression equation y 4.225 8.285 x to determine
the predicted values of y.
ANSWER:
y : 48.137, 63.878, 14.996, 19.139, 33.223, 25.767, 70.506, 24.110, 41.508, and 59.736.
290.
{Movie Revenues Narrative} Use the predicted and actual values of y to calculate the
residuals.
ANSWER:
ri : -0.137, 1.122, 3.004, 0.861, -2.223, 0.233, 2.494, -1.110, 2.508, and 1.736
158
291.
Chapter Seventeen
{Movie Revenues Narrative} Plot the residuals against the predicted values of y. Does the
variance appear to be constant.
ANSWER:
Residuals
4
2
0
-2
10
20
30
40
50
60
70
80
-4
Predicted Values

292.
{Movie Revenues Narrative} Compute the standardized residuals.

ANSWER:
The standardized residuals are: 0.072, 0.588, 1.574, 0.451, -1.165, 0.122, 1.306, -0.581,
-1.314, and 0.909.
293.
{Movie Revenues Narrative} Identify possible outliers.

ANSWER:
No outliers exist, since no observation has standardized residual whose absolute value
exceeds 2.0.

Age
Number of Concerts
62
6
57
5
40
4
49
3
67
5
54
5
43
2
65
6
54
3
41
1
Age
Number of Concerts
44
3
48
2
55
4
60
5
59
4
63
5
69
4
40
2
38
1
52
3

159
SUMMARY OUTPUT
Multiple R
0.80203
R Square
0.64326
Adjusted R Square
0.62344
Standard Error
0.93965
Observations
20
Age
Mean
Standard Error
Standard Deviation
Sample Variance
Count
53
2.1849
9.7711
95.4737
20
Concerts
Mean
Standard Error
Standard Deviation
Sample Variance
Count
MS
28.65711
0.88294
F
32.45653
Significance F
2.1082E-05
t Stat
-2.53491
5.69706
P-value
0.02074
0.00002
Lower 95%
-5.50746
0.07934
3.65
0.3424
1.5313
2.3447
20

ANOVA
Regression
Residual
Total
Intercept
Age
294.
df
1
18
19
SS
28.65711
15.89289
44.55

-3.01152
1.18802
0.12569
0.02206
{Willie Nelson Concert Narrative} Use the regression equation y 3.0115 0.1257 x to
ANSWER:
The predicted values y are:
4.781 4.153 2.016 3.147
2.519 3.022 3.901 4.530
295.
5.410
4.404
3.776
4.907
2.393
5.661
5.158
2.016
3.776
1.765
2.142
3.524
{Willie Nelson Concert Narrative} Use the predicted values and the actual values of y to
ANSWER:
The residuals r y y are:
1.219 0.847 1.984 -0.147 -0.410
0.481 -1.022
0.099
0.470 -0.404
296.
Upper 95%
-0.5156
0.1720
1.224
0.093
-0.393 0.842
-1.661 -0.016
-0.776
-0.765
-1.142
-0.524
{Willie Nelson Concert Narrative} Plot the residuals in against the predicted values y .
160
Chapter Seventeen
ANSWER:
3.000
Residuals
2.000
1.000
0.000
-1.000
-2.000
Predicted
297.
{Willie Nelson Concert Narrative} Does it appear that heteroscedasticity is a problem?

Explain.
ANSWER:
The variance of the error variable appears to be constant; therefore heteroscedasticity is
not a problem.
298.
{Willie Nelson Concert Narrative} Draw a histogram of the residuals.

ANSWER:
Histogram
Frequency
10
8
6
4
2
0
-1
Residuals
299.
{Willie Nelson Concert Narrative} Does it appear that the errors are normally
distributed? Explain.

161
ANSWER:
The histogram is positively skewed. The errors may not be normally distributed.
300.
{Willie Nelson Concert Narrative} Use the residuals to compute the standardized
residuals.
ANSWER:
The standardized residuals r / s are:
1.297 0.902 2.111 -0.157 -0.436
0.512 -1.087 0.105 0.500 -0.430
301.
1.303
0.099
-0.418
-1.768
0.896
-0.017
-0.826
-0.814
-1.215
-0.558
{Willie Nelson Concert Narrative} Identify possible outliers.

ANSWER:
There are no outliers since none of the 20 observations has a standardized residual whose
absolute value exceeds 2.0.

Oil degrees API
27.0
28.5
30.8
31.3
31.9
34.5
34.0
34.7
37.0
41.0
41.0
38.8
39.3
Degrees
Price

12.02
12.04
12.32
12.27
12.49
12.70
12.80
13.00
13.00
13.17
13.19
13.22
13.27
21.281667
2.026750
Variable
N
Mean
Mean
Degrees
13
34.60
Price
13 12.730
Covariances
Degrees
0.208833
Regression Analysis
Predictor
Coef
StDev
T
Constant
9.4349
0.2867
32.91
Degrees
0.095235 0.008220 11.59
P
0.000
0.000
Price
StDev
SE
4.613
0.457
1.280
0.127
162
Chapter Seventeen
S = 0.1314
R-Sq = 92.46%
R-Sq(adj) = 91.7%
Source
Regression
Residual Error
Total
302.
DF
1
11
12
SS
2.3162
0.1898
2.5060
MS
2.3162
0.0173
F
134.24
P
0.000
{Oil Quality and Price Narrative} Use the regression equation y 9.4349 0.095235 x to
ANSWER:
The predicted values y are: 12.006, 12.149, 12.368, 12.416, 12.473, 12.721, 12.673,
12.740, 12.959, 13.340, 13.340, 13.130, and 13.178.
303.
{Oil Quality and Price Narrative} Use the predicted values and the actual values of y to
ANSWER:
The residuals r y y are: 0.014, -0.109, -0.048, -0.146, 0.017, -0.021, 0.127, 0.260,
0.041, -0.170, -0.150, 0.090, and 0.092.
{Oil Quality and Price Narrative} Plot the residuals against the predicted values y .
ANSWER:
Residuals Versus the Fitted Values

(response is Price)
0.3
0.2
Residual
304.
0.1
0.0
-0.1
-0.2
12.0
12.2
12.4
12.6
12.8
Fitted Value
13.0
13.2
13.4

163
305.
{Oil Quality and Price Narrative} Does it appear that heteroscedasticity is a problem?
Explain.
ANSWER:
The variance of the error variable appears to be constant; therefore heteroscedasticity is
not a problem.
306.
{Oil Quality and Price Narrative} Draw a histogram of the residuals.

ANSWER:
Histogram of the Residuals
(response is Price)
5
Frequency
307.
{Oil
0 Quality and Price Narrative} Does it appear that the errors are normally distributed?
Explain. -0.2
-0.1
0.0
0.1
0.2
0.3
Residual
ANSWER:
The histogram is fairly symmetric; therefore we may conclude that the errors are
normally distributed.
308.
{Oil Quality and Price Narrative} Use the residuals to compute the standardized
residuals.
ANSWER:
The standardized residuals r / s are: 0.105, -0.830, -0.366, -1.109,
0.967, 1.982, 0.315, -1.290, -1.138, 0.685, and 0.703.
0.130, -0.156,
164
Chapter Seventeen
309.
Identify possible outliers.

ANSWER:
There are no outliers since none of the 13 observations has a standardized residual whose
absolute value exceeds 2.0. However, observation 9 with standardized residual of 1.982
may be an outlier.

Statistics-17 by Keller

Uploaded by

Copyright:

Available Formats

Statistics-17 by Keller

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistics-17 by Keller

Uploaded by

Copyright:

Available Formats

Simple Linear Regression and Correlation

If an estimated regression line has a y-intercept of 10 and a slope of 4, then when x = 2

Given the least squares regression line y = 5 2x:

A regression analysis between weight (y in pounds) and height (x in inches) resulted in

The residual is defined as the difference between:

Simple Linear Regression and Correlation

In the simple linear regression model, the y-intercept represents the:

In the simple linear regression model, the slope represents the:

In regression analysis, the residuals represent the:

If a simple linear regression model has no y-intercept, then:

Simple Linear Regression and Correlation

An inverse relationship between an independent variable x and a dependent variably y

A direct relationship between an independent variable x and a dependent variably y

A regression analysis between weight ( y in pounds) and height ( x in inches) resulted in

Simple Linear Regression and Correlation

The following 10 observations of variables x and y were collected.

A scatter diagram includes the following data points:

Consider the following data values of variables x and y.

a. Determine the least squares regression line.

Predicted Skin Cancer

Linear (Predicted Skin

Average Daily Sunshine

Simple Linear Regression and Correlation

FOR QUESTIONS 46 THROUGH 49, USE THE FOLLOWING NARRATIVE:

It appears that a linear model is appropriate.

FOR QUESTIONS 50 THROUGH 53, USE THE FOLLOWING NARRATIVE:

Simple Linear Regression and Correlation

It appears that a linear model is appropriate.

FOR QUESTIONS 54 THROUGH 57, USE THE FOLLOWING NARRATIVE:

It appears that a linear model is appropriate.

Simple Linear Regression and Correlation

FOR QUESTIONS 58 THROUGH 61, USE THE FOLLOWING NARRATIVE:

Cost of Two Highest

Payment to Top Tw o Stars

FOR QUESTIONS 62 THROUGH 65, USE THE FOLLOWING NARRATIVE:

Selling Price ($)

{Cost of Books Narrative} Determine the least squares regression line.

Simple Linear Regression and Correlation

Number of Pages Line Fit Plot

Linear (Predicted Selling

Predicted Selling Price

FOR QUESTIONS 66 THROUGH 68, USE THE FOLLOWING NARRATIVE:

FOR QUESTIONS 69 THROUGH 73, USETHE FOLLOWING NARRATIVE:

An Excel output follows :

SPEARMAN RANK CORRELATION COEFFICIENT=0.8306

Coefficients Standard Error

Simple Linear Regression and Correlation

FOR QUESTIONS 74 THROUGH 77, USE THE FOLLOWING NARRATIVE:

Price per barrel (in $)

A partial Minitab output follows:

Simple Linear Regression and Correlation

Simple Linear Regression and Correlation

The symbol for the population coefficient of correlation is:

The symbol for the sample coefficient of correlation is:

Simple Linear Regression and Correlation

In regression analysis, if the coefficient of determination is 1.0, then:

Correlation analysis is used to determine:

In regression analysis, if the coefficient of correlation is 1.0, then:

Simple Linear Regression and Correlation