Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
62 views

Simple Linear Regression

Exam 2 has a stronger correlation with course grade than Exam 1 based on the scatterplots. There is a positive linear relationship between height and shoulder girth. Calories and protein in Starbucks menu items have a positive linear relationship, with calories as the predictor and protein as the outcome. The residuals vs predicted plot shows higher variability in predictions for items with lower predicted protein. Unemployment rate predicts percent below poverty level, with the intercept interpreting zero unemployment and the slope interpreting change in poverty for each change in unemployment. The correlation coefficient is 0.68, indicating a moderate positive correlation between percent who own homes and percent urban population, with DC being an outlier observation.

Uploaded by

neha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views

Simple Linear Regression

Exam 2 has a stronger correlation with course grade than Exam 1 based on the scatterplots. There is a positive linear relationship between height and shoulder girth. Calories and protein in Starbucks menu items have a positive linear relationship, with calories as the predictor and protein as the outcome. The residuals vs predicted plot shows higher variability in predictions for items with lower predicted protein. Unemployment rate predicts percent below poverty level, with the intercept interpreting zero unemployment and the slope interpreting change in poverty for each change in unemployment. The correlation coefficient is 0.68, indicating a moderate positive correlation between percent who own homes and percent urban population, with DC being an outlier observation.

Uploaded by

neha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Problem Set 11

Simple Linear Regression

1. The two scatterplots below show the relationship between the overall course average and two midterm
exams (Exam 1 and Exam 2) recorded for 233 students during several years for a statistics course at a
university.
100 100

90 90
Course grade

Course grade
80 80

70 70

60 60

50 50

60 80 100 40 60 80 100
Exam 1 grade Exam 2 grade

a. Based on these graphs, which of the two exams has the strongest correlation with the course grade?
Explain.
b. Can you think of a reason why the correlation between the exam you chose in part (a) and the
course grade is higher?
2. Researchers studying anthropometry collected body and skeletal diameter measurements, as well as age,
weight, height and sex for 507 physically active individuals. The scatterplot below shows the relationship
between height and shoulder girth (circumference of shoulders measured over deltoid muscles), both
measured in centimeters.
200

190
Height (cm)

180

170

160

150

90 100 110 120 130


Shoulder girth (cm)
a. Describe the relationship between shoulder girth and height.
b. How would the relationship change if shoulder girth was measured in inches while the units of
height remained in centimeters?

1
3. The scatterplot below shows the relationship between the number of calories and amount of protein (in
grams) Starbucks food menu items contain. Since Starbucks only lists the number of calories on the
display items, we might be interested in predicting the amount of protein a menu item has based on its
calorie content.

30 20
Protein (grams)

Residuals
20 10

10 0

0 −10
100 200 300 400 500 5 10 15
Calories Predicted protein (grams)

a. Describe the relationship between number of calories and amount of protein (in grams) that
Starbucks food menu items contain.
b. In this scenario, what are the predictor and outcome variables?
c. Why might we want to fit a regression line to these data?
d. What does the residuals vs. predicted plot tell us about the variability in our prediction errors
based on this model for items with lower vs. higher predicted protein?
4. The following scatterplot shows the relationship between percent of population below the poverty level
(poverty) from unemployment rate among those ages 20-64 (unemployment_rate) in counties in the
US, as provided by data from the 2019 American Community Survey. The regression output for the
model for predicting poverty from unemployment_rate is also provided.
Percent below the poverty level

40%

30%

20%

10%

0% 10% 20%
Unemployment rate

term estimate std.error statistic p.value


(Intercept) 4.604 0.349 13.182 <0.0001
unemployment_rate 2.054 0.062 33.110 <0.0001

2
a. Write out the linear model.
b. Interpret the intercept.
c. Interpret the slope.
d. For this model R2 is 46%. Interpret this value.
e. Calculate the correlation coefficient.
5. The scatterplot below shows the percent of families who own their home vs. the percent of the population
living in urban areas. There are 52 observations, each corresponding to a state in the US. Puerto Rico
and District of Columbia are also included.
Percent who own their home
70%

60%

50%

40% 60% 80% 100%


Percent urban population
a. Describe the relationship between the percent of families who own their home and the percent of the
population living in urban areas.
b. The outlier at the bottom right corner is District of Columbia, where 100% of the population is
considered urban. What type of an outlier is this observation?

You might also like