BZAN 535: Linear Regression

The document discusses fitting linear regression models to sales data from a product sold over two years. It explores fitting a simple linear regression of total quantity on average price, as well as a log-log model. Diagnostic plots show some issues with assumptions for the linear model, while diagnostic plots for the log-log model show better fit. The bootstrap confidence interval for the slope of the linear model is wide, indicating uncertainty in the estimate.

Uploaded by

Motasima

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views

BZAN 535: Linear Regression

Uploaded by

Motasima

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

BZAN 535: Linear regression

The file HW4PROB1.csv reports total weekly quantity sold and average price, as well as other
information, over a period of two years, for a product sold in Kroger.
First, exclude the data for weeks 1-21 and 82-102, to focus on the year when the
typical price appears to be $1.34.
(a) Examine the scatterplot of total_quantity against avg_price. Briefly discuss your
observations.
library(ggplot2)
library(dplyr)
prob1 = read.csv("HW4PROB1.csv")
max(prob1$week_no)

## [1] 102

prob1_data = prob1 %>% filter(week_no %in% c(22:81))

head(prob1_data)

## week_no total_sales_value total_quantity total_retail_discount avg_price

## 1 22 46.80 35 -8.85 1.337
## 2 23 56.18 42 -10.60 1.338
## 3 24 44.15 33 -8.32 1.338
## 4 25 69.49 52 -13.19 1.336
## 5 26 54.85 41 -10.34 1.338
## 6 27 58.85 44 -11.11 1.338

ggplot(prob1_data, aes(x=avg_price, y= total_quantity)) + geom_point()

Response: With the increase of average price the total quantity decreases. We can
assume there is a linear relation between them. There is a cluster near 1.33. If we try
to fit the data in SLM it will have a negative slope & high positive intercept.
(b) Fit a simple linear regression of total_quantity on avg_price. Report summary() of
your fitted model. Report the R❑2, RMSE and slope, and interpret all three in the
context of the problem.
b1 <- lm(total_quantity~avg_price, data = prob1_data)
summary(b1)

##
## Call:
## lm(formula = total_quantity ~ avg_price, data = prob1_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -30.896 -16.772 -7.644 2.262 105.011
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 335.64 22.49 14.92 <2e-16 ***
## avg_price -217.90 17.56 -12.41 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 26.74 on 58 degrees of freedom
## Multiple R-squared: 0.7265, Adjusted R-squared: 0.7217
## F-statistic: 154 on 1 and 58 DF, p-value: < 2.2e-16

RMSE = sqrt(sum(b1$residuals^2)/b1$df)
c("R-squared"=summary(b1)$r.squared, "RMSE"= RMSE)

## R-squared RMSE
## 0.7264598 26.7407362

Response: When we try to fit the data into a simple linear model, assuming they have
a linear relationship, where total_quantity = intercept + (slope)* avg_price. Intercept
is 335.64, means when avg_price is 0, total quantity is 335.64, which is practically not
possible. Slope is -217.90 means at any given point on the line the ratio of y & x will
be -217.90, as with the increase of x, y is decreasing, negative is there. R^2 depicts
how well the data is fitted into the SLM. the closer the value to 1, the more fitted it is.
Here our value is .72645, without further investigation it cannot be said as the best
fit for this model, but it is not a bad fit. Similarly, RMSE is the measure of the variance
of the sample points from the line. The smaller, the better. here we found it to be
26.74. Not a bad fit
(c) Report the residual diagnostic plots for the fitted model, including a plot of the
residuals vs. week_no. Comment on any problems or inadequacies evident in the
residual diagnostic plots (as far as assumptions for inference with your model in (b)
are concerned). Also, discuss potential solutions to these problems, if any.
par(mfrow= c(2,2)); plot(b1)
c1 = prob1_data %>% mutate(residual = b1$residuals)
head(c1)

## week_no total_sales_value total_quantity total_retail_discount avg_price

## 1 22 46.80 35 -8.85 1.337
## 2 23 56.18 42 -10.60 1.338
## 3 24 44.15 33 -8.32 1.338
## 4 25 69.49 52 -13.19 1.336
## 5 26 54.85 41 -10.34 1.338
## 6 27 58.85 44 -11.11 1.338
## residual
## 1 -9.31695390
## 2 -2.09905728
## 3 -11.09905728
## 4 7.46514948
## 5 -3.09905728
## 6 -0.09905728

ggplot(c1, aes(x= week_no, y= residual)) + geom_point() + geom_smooth(method

= "lm", se= TRUE)
c1 = c1 %>% mutate(log.total= log(total_quantity))
c1.log = lm(log.total ~ avg_price, data = c1)
summary(c1.log)

##
## Call:
## lm(formula = log.total ~ avg_price, data = c1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.85295 -0.15238 -0.01512 0.18389 0.71651
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.5075 0.2946 25.48 <2e-16 ***
## avg_price -2.9013 0.2300 -12.62 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3503 on 58 degrees of freedom
## Multiple R-squared: 0.7329, Adjusted R-squared: 0.7283
## F-statistic: 159.1 on 1 and 58 DF, p-value: < 2.2e-16
par(mfrow= c(2,2)); plot(c1.log)

Response: From the Q-Q plot, the points on the right shows a bump, which means
linear model is not the good fit for these. In residual vs fitted plot, the red line do not
show any regular patterns. for the sample to be a good fit, this should have a straight
line along zero, showing the residuals are not varying. In scale-location plot, it is also
similar to, the previous one. shows, the varience between the SLM & actual value is
arbitrarily changing. Lastly the residuals vs leverage is showing there the values
outside the line who can influence the plot. From these plot, we can say, our
assumption on the linear relation between total quantity & avg price might not be
true. as the residual is not having 0 properties, & the variance is also not 0. Since we
can see a bump on the right end, we can consider the log of total quantity might be a
better fit
(d) Use bootstrap to obtain a 95% confidence interval of the slope (coefficient
corresponding to avg_price). Interpret the confidence interval you obtain.
set.seed(123)
lmod <- lm(total_quantity ~ avg_price, data = prob1_data) # fit model with
observed data
resids <- residuals(lmod) # rediduals from model on observed data
preds <- fitted(lmod) # predicted values from model on observed data
nb <- 4000 # number of bootsrap samples to draw
coefmat <- matrix(0,nb,2) # intitial empty matrix to store estimates
for(i in 1:nb) # repeat process
{
boot_y <- preds + sample(resids, rep=TRUE) # generate y* with given e*
bmod <- update(lmod, boot_y ~ .) # fit model with bootsrap data (x, y*)
coefmat[i,] <- coef(bmod) # store estimates from model on bootsrap data
}
colnames(coefmat) <- c("Intercept","Avg Price")
coefmat <- data.frame(coefmat)
cbind(t(apply(coefmat,2,function(x) quantile(x,c(0.025,0.975)))),
confint(lmod))

## 2.5% 97.5% 2.5 % 97.5 %

## Intercept 294.4504 381.5771 290.6243 380.6651
## Avg.Price -252.6721 -184.3160 -253.0400 -182.7532

Response: From the bootstrap, for the sample to correspond to SLM, we are 95%
confident that the slopes of all data will be within -252.9235 & -185.2393, intercept
will be 294.1508 381.5539
(e) Using the model you fitted in (b), report a 95% confidence interval for the expected
amount of weekly unit sales at $1.59. What does this reveal? (Hint: Create and
examine the residuals vs avg_price plot and comment on the residuals at $1.59.)
prob1_data$avg_price= as.numeric(prob1_data$avg_price)
predict(b1, newdata= data.frame(avg_price= 1.59), interval= "confidence",
level=.95)

## fit lwr upr

## 1 -10.81089 -24.13423 2.512451

ggplot(c1, aes(x=avg_price, y= residual)) + geom_point()+ geom_smooth(method

= "lm", se= TRUE)
Response: From this fit we are 95% confident that, expected amount of weekly sales
for 1.59 is between -24.13423 & 2.512451. the lower limit is practically not possible.
& the negative value depicts we have under estimated. This is also evident from
residuals, since we find positive residal at 1.59
(f) Now fit a simple linear regression model of log(total_quantity) on
log(avg_price). This is often referred to as the price elasticity of demand model.
Examine the regression diagnostic plots for this model and discuss your
observations. Also, interpret the slope of the model.
f1.lm = lm(log(total_quantity)~log(avg_price), data = prob1_data)
summary(f1.lm)

##
## Call:
## lm(formula = log(total_quantity) ~ log(avg_price), data = prob1_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.82390 -0.17061 0.00956 0.18877 0.67901
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.60525 0.07063 65.20 <2e-16 ***
## log(avg_price) -3.46442 0.25339 -13.67 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3298 on 58 degrees of freedom
## Multiple R-squared: 0.7632, Adjusted R-squared: 0.7591
## F-statistic: 186.9 on 1 and 58 DF, p-value: < 2.2e-16

f1 = prob1_data %>% mutate(residual = f1.lm$residuals)

ggplot(f1, aes(x= week_no, y= residual)) + geom_point() + geom_smooth(method

= "lm", se= TRUE)

par(mfrow=c(2,2)) ;plot(f1.lm)
Response: Intercept is 4.60525, means when avg_price is 0, total quantity is 4.60525,
which is practically not possible yet, better fit from the Q-Q plot. Slope is -3.46442
means at any given point on the line the ratio of y & x will be -3.46442, as with the
increase of x, y is decreasing. From the Q-Q plot, the points on are roughly fitting with
the line, which means linear model is a the good fit for these. In residual vs fitted
plot, the red line do not show any regular patterns. But showing tendency towards 0.
In scale-location plot, it is also similar to, the previous one. shows, the variance
between the SLM & actual value is arbitrarily changing. Lastly the residuals vs
leverage is showing there the values outside the line who can influence the plot.
From these plot, we can say, our assumption on the linear relation between log of
total quantity & avg price is a better fit than the previous one
(g) Report a 95% confidence interval for the expected amount of weekly unit sales at
$1.59 using this model. What does this reveal? How does this differ from your
results in (d)?
prob1_data$avg_price= as.numeric(prob1_data$avg_price)

predict(f1.lm, newdata= data.frame(avg_price= 1.59), interval= "confidence",

level=.95)

## fit lwr upr

## 1 2.998679 2.849512 3.147846
conf.interval = c("lower"= exp(2.849512), "upper"= exp(3.147846))
conf.interval

## lower upper
## 17.27935 23.28585

ggplot(f1, aes(x=(avg_price), y= residual)) + geom_point() + geom_smooth()

Response: From this fit we are 95% confident that, expected amount of weekly sales
for 1.59 is between 17.27935 23.28585. Since neither in log scale or in linear scale
we have negative value in confidence interval it can be deemed as a better fit than
the one in e, Also, even though the residual is not 0, we now have residual close to 0,
residual observed is .3, we have less error in estimation & r2 is more than the
previous one as well.

Credit Report
100% (3)
Credit Report
19 pages
Farewell Offshoring, Outsourcing. Pandemic Rewrites CEO Playbook. - WSJ
No ratings yet
Farewell Offshoring, Outsourcing. Pandemic Rewrites CEO Playbook. - WSJ
7 pages
Invoice
No ratings yet
Invoice
1 page
Factorial Experiments R Codes
No ratings yet
Factorial Experiments R Codes
7 pages
Catene Rigide Serapid
No ratings yet
Catene Rigide Serapid
19 pages
Homework 2
100% (1)
Homework 2
14 pages
A2 copy 2
No ratings yet
A2 copy 2
8 pages
Case-Study-1-Codes
No ratings yet
Case-Study-1-Codes
4 pages
Chapter 3 Homework (Take 2)
No ratings yet
Chapter 3 Homework (Take 2)
7 pages
Multiple Linear Regression
100% (1)
Multiple Linear Regression
14 pages
Regressione Logistica1
No ratings yet
Regressione Logistica1
8 pages
Count Models in JAGS
No ratings yet
Count Models in JAGS
16 pages
Assignment 1
No ratings yet
Assignment 1
16 pages
20BCE1205 Lab3
No ratings yet
20BCE1205 Lab3
9 pages
Advance Stats Problem Statement 3 Analysis
No ratings yet
Advance Stats Problem Statement 3 Analysis
5 pages
Assignment R New 1
No ratings yet
Assignment R New 1
26 pages
2018-07-acb-precision-imprecision-and-trueness-bias-verification-d
No ratings yet
2018-07-acb-precision-imprecision-and-trueness-bias-verification-d
7 pages
WEEK
No ratings yet
WEEK
17 pages
Fitting The Nelson-Siegel-Svensson Model With Differential Evolution
No ratings yet
Fitting The Nelson-Siegel-Svensson Model With Differential Evolution
10 pages
R thing
No ratings yet
R thing
9 pages
HW3 Solutions - Stats 500: Problem 1
No ratings yet
HW3 Solutions - Stats 500: Problem 1
4 pages
H-311 Linear Regression Analysis With R
100% (1)
H-311 Linear Regression Analysis With R
71 pages
HW12
No ratings yet
HW12
10 pages
CPI.vt (1)
No ratings yet
CPI.vt (1)
11 pages
Linear Regression in R
No ratings yet
Linear Regression in R
19 pages
Bioestadistica: Clara Carner 2023-05-29
No ratings yet
Bioestadistica: Clara Carner 2023-05-29
4 pages
Linear Regression With Pytroch
No ratings yet
Linear Regression With Pytroch
13 pages
Problem Set 4 Solution Numerical Methods
No ratings yet
Problem Set 4 Solution Numerical Methods
6 pages
Problem Set 6 Solution Numerical Methods
No ratings yet
Problem Set 6 Solution Numerical Methods
11 pages
Arpan Final Assignment EcII
No ratings yet
Arpan Final Assignment EcII
39 pages
Tutorial-4
No ratings yet
Tutorial-4
16 pages
HW2 MAT353 RCode
No ratings yet
HW2 MAT353 RCode
11 pages
Homework 2
100% (1)
Homework 2
12 pages
Panel 2
No ratings yet
Panel 2
26 pages
2101f12LogisticRegressionWithR1
No ratings yet
2101f12LogisticRegressionWithR1
10 pages
Anova Review
100% (1)
Anova Review
8 pages
PHD Econ, Applied Econometrics 2021/22 - Takehome University of Innsbruck
No ratings yet
PHD Econ, Applied Econometrics 2021/22 - Takehome University of Innsbruck
20 pages
R Code Default Data PDF
No ratings yet
R Code Default Data PDF
10 pages
DSC241_MansiSharma_FinalExam_Dec14
No ratings yet
DSC241_MansiSharma_FinalExam_Dec14
25 pages
Dosis Respuesta R
No ratings yet
Dosis Respuesta R
11 pages
HW1 - Problem 3.4.1: General Model
No ratings yet
HW1 - Problem 3.4.1: General Model
6 pages
Fitting The Nelson-Siegel-Svensson Model With Differential Evolution
No ratings yet
Fitting The Nelson-Siegel-Svensson Model With Differential Evolution
10 pages
Shivam Batra (19BPS1131) 21/01/2022: List
No ratings yet
Shivam Batra (19BPS1131) 21/01/2022: List
5 pages
m565A3-24F
No ratings yet
m565A3-24F
22 pages
Practice-Training_BTTC
No ratings yet
Practice-Training_BTTC
25 pages
STAT2 2e R Markdown Files Sec4.3
No ratings yet
STAT2 2e R Markdown Files Sec4.3
4 pages
HW2 Solution
No ratings yet
HW2 Solution
7 pages
Welcome To Cmpe140 Final Exam: Studentid
No ratings yet
Welcome To Cmpe140 Final Exam: Studentid
21 pages
Dar Solved Ans
No ratings yet
Dar Solved Ans
20 pages
Homework 3 R Tutorial: How To Use This Tutorial
No ratings yet
Homework 3 R Tutorial: How To Use This Tutorial
8 pages
Assingment 8: Chitresh Kumar
No ratings yet
Assingment 8: Chitresh Kumar
20 pages
Modern Regression Homework 5-1
No ratings yet
Modern Regression Homework 5-1
8 pages
Practical Machine Learning
No ratings yet
Practical Machine Learning
11 pages
Lab-9 RMD
No ratings yet
Lab-9 RMD
5 pages
A1
No ratings yet
A1
8 pages
Assignment 7: Chitresh Kumar
No ratings yet
Assignment 7: Chitresh Kumar
12 pages
Using Maxlik
No ratings yet
Using Maxlik
20 pages
Homework #6
No ratings yet
Homework #6
16 pages
ANZ Virtual Internship Module Model Answer For Task 1
No ratings yet
ANZ Virtual Internship Module Model Answer For Task 1
7 pages
ch12bt20 GDP-update
No ratings yet
ch12bt20 GDP-update
15 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
Machine Learning-Lecture 1(Student)
No ratings yet
Machine Learning-Lecture 1(Student)
14 pages
Fx-82sx 220plus Etc en
No ratings yet
Fx-82sx 220plus Etc en
2 pages
Gd Script
From Everand
Gd Script
Marijo Trkulja
No ratings yet
GRAP 17 Property Plant Equipment 1 April 2021
No ratings yet
GRAP 17 Property Plant Equipment 1 April 2021
37 pages
14 F DISTRIBUTION - Pps
No ratings yet
14 F DISTRIBUTION - Pps
11 pages
Form Penawaran Rig 350 HP - Rev1
No ratings yet
Form Penawaran Rig 350 HP - Rev1
25 pages
Esc-Unit 4 (Rate Analysis)
No ratings yet
Esc-Unit 4 (Rate Analysis)
15 pages
17 9 2020 Apv2.0000002375 Apvz2-Ocs-325902
No ratings yet
17 9 2020 Apv2.0000002375 Apvz2-Ocs-325902
1 page
4704315data Interpretation 2 Question 7 Pages 02-09-19
No ratings yet
4704315data Interpretation 2 Question 7 Pages 02-09-19
7 pages
Chapter 15 Harvesting The Entrepreneurial Venture
No ratings yet
Chapter 15 Harvesting The Entrepreneurial Venture
29 pages
Managerial Economics Foundations of Business Analysis and Strategy 12th Edition Thomas Test Bank 1
100% (48)
Managerial Economics Foundations of Business Analysis and Strategy 12th Edition Thomas Test Bank 1
36 pages
Cobb Douglas
No ratings yet
Cobb Douglas
6 pages
Conveyance & Deemed Conveyance
No ratings yet
Conveyance & Deemed Conveyance
37 pages
Assignment 7 - Engineering Statistics - Spring 2018
No ratings yet
Assignment 7 - Engineering Statistics - Spring 2018
6 pages
BP Op Entpr S4hana1809 05 Forms List en BR
No ratings yet
BP Op Entpr S4hana1809 05 Forms List en BR
5 pages
CB 2021 Pelletgear Buyersguide
No ratings yet
CB 2021 Pelletgear Buyersguide
4 pages
Ap ECONOMY
No ratings yet
Ap ECONOMY
2 pages
Circular (Fee Payment) - Ay 18-19
No ratings yet
Circular (Fee Payment) - Ay 18-19
2 pages
Chapter 6 PDF
No ratings yet
Chapter 6 PDF
3 pages
DFS Opinion
No ratings yet
DFS Opinion
4 pages
FAC-Daily Report-RAMESH
No ratings yet
FAC-Daily Report-RAMESH
44 pages
Daftarpeserta Bayanrun2023
No ratings yet
Daftarpeserta Bayanrun2023
18 pages
A Brief Guide To Hand Weaving Looms
No ratings yet
A Brief Guide To Hand Weaving Looms
4 pages
Eco 1101 Introductory Microeconomics
No ratings yet
Eco 1101 Introductory Microeconomics
2 pages
Tax Invoice/Bill of Supply/Cash Memo: (Original For Recipient)
No ratings yet
Tax Invoice/Bill of Supply/Cash Memo: (Original For Recipient)
1 page
Hsslive Xii Acc Ca Key Ajith March 2024
No ratings yet
Hsslive Xii Acc Ca Key Ajith March 2024
6 pages
GROUP 2 - Group Activity 3
No ratings yet
GROUP 2 - Group Activity 3
3 pages
Emp Elétr - Patolada Jungheinrich (Aberta) EJC - B12 - B20 - UK
No ratings yet
Emp Elétr - Patolada Jungheinrich (Aberta) EJC - B12 - B20 - UK
4 pages
LYSA Ninja Van - Independent-Contractor-Agreement
No ratings yet
LYSA Ninja Van - Independent-Contractor-Agreement
3 pages