Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
2 views

Simple Linear Regression Using a Real Dataset in R and Excel

Uploaded by

bsf23000703
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Simple Linear Regression Using a Real Dataset in R and Excel

Uploaded by

bsf23000703
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Simple linear regression using a real dataset in R

Video: Fit simple linear models

Earlier, the marketing dataset from the datarium package was used in R to explore relationships in data. A strong
linear relationship was found to exist between the variables YouTube and sales. In this section, the relationship
between these two variables will be explored more using simple linear regression.

Load the marketing dataset into a variable in R using the following code:

require(datarium)

md <- marketing

Use the function lm() to perform linear regression with the variables sales and YouTube. In this case, sales is the
response variable, and YouTube is the explanatory variable.

model <- lm(sales~youtube, data=md)

summary(model)

This code produces the following output:

Call:

lm(formula = sales ~ youtube, data = md)

Residuals:

Min 1Q Median 3Q Max

-10.0632 -2.3454 -0.2295 2.4805 8.6548

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 8.439112 0.549412 15.36 <2e-16 ***

YouTube 0.047537 0.002691 17.67 <2e-16 ***

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.91 on 198 degrees of freedom

Multiple R-squared: 0.6119, Adjusted R-squared: 0.6099

F-statistic: 312.1 on 1 and 198 DF, p-value: < 2.2e-16

Interpretation Of R Output

o The estimated regression line equation is sales = 8.439 + 0.048 · youtube. The value of the y-
intercept is 8.439 (to 3 decimal places) and the slope is 0.048 (to 3 decimal places). Note: the
coefficient labeled as "Estimate" under the "Coefficients" section represents the slope of
the YouTube estimated regression line.

o Alternatively, compute the slope of the regression line using the formula (to 3 decimal places),
where r is the correlation coefficient, is the standard deviation of the dependent variable (sales),
and is the standard deviation of the independent variable (YouTube). Calculate this value in R
using the following code:

o cor(md$youtube,md$sales)*(sd(md$sales)/sd(md$youtube))

This code produces the following output:

[1] 0.04753664

o Alternatively, compute the y-intercept of the regression line using the formula ▁▁ (to 1 decimal
place). Calculate this value in R using the following code:

o mean(md$sales) - (0.048 * mean(md$youtube))

This code produces the following output:

[1] 8.357352

Note: The slope and y-intercept calculated here are slightly different from the results of the R linear regression due
to rounding the slope to two significant figures.

o When the YouTube advertising budget is 0, sales are expected to amount to 8.439 = 8,439 dollars
(recall that sales and YouTube units are in thousands of dollars)

o A 1 unit increase in the YouTube budget should result in a 0.048 unit increase in sales.
As sales and YouTube units are given in thousands of dollars, it means that a 1000 dollar increase
in the YouTube budget should result in a 48-dollar increase in sales.

Simple linear regression using a real dataset in Excel

Import the marketing.csv dataset in the folder DATA to a new Excel worksheet.

Perform a regression analysis using the Data Analysis ToolPak in Excel.


To install the ToolPak, click File > Options > Add-ins > Analysis ToolPak.
In the Manage drop-down list, select Excel Add-ins and click Go.
In the Add-ins window that pops up, select Analysis ToolPak and click OK.
The Data Analysis button appears on the Data tab.

Next, click the Data Analysis button on the Data tab.


Select Regression and click OK.
In the Regression dialog box that pops up, configure the input Y range (sales) and input X range (YouTube).
Check Labels if your X and Y ranges contain the headers YouTube and sales, respectively (the range for Y should be
$D$1:$D$201 and that for X should be $A$1:$A$201).
Under the Output option, select New Worksheet Ply. Check Residuals to obtain the difference between the
predicted and actual values. Click OK.
Figure 3-23 displays the Regression dialog box populated with required values.
Figure 3-23

Interpreting The Regression Analysis Output

The table in Figure 3-24 provides statistical measures that indicate how well the model fits the data.

Figure 3-24

R-square is a statistical measure that explains how much of the variance in the response variable (sales) is explained
by the explanatory variable (YouTube). Often, the larger the value of R-square, the better the regression model fits
your observations. The R-squared value of 0.6119 indicates that the YouTube predictor accounts for approximately
61% of the variance in sales. The Multiple R is the correlation coefficient that we computed earlier.

The standard error of the regression is a summary of how far each of the observed values falls from the regression
line. In this example, the distance is 3.91. A low distance value is better. Such a value would indicate that the
distances between the data points and the fitted values are small.

An analyst can show the relationship between sales and YouTube using a linear regression chart.

To create this chart, first, create a scatter plot of sales against YouTube using the method from earlier in the lesson.

Now, draw the least squares regression line. Right click on any point and select Add Trendline from the context
menu.

On the right pane that appears, select Linear under Trendline Options and check Display Equation on Chart.
Use the Fill & Line tab to customize the line, e.g., change the line to a solid line, the color of the line to red, and the
dash type to unbroken line. Figure 3-27 shows the resulting linear regression chart.

Figure 3-27

You will see that the regression line in Figure 3-27 has the same coefficients as those obtained from the R and Excel
regression outputs.

Testing the significance of the slope

You might also like