Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
3 views

Simple Linear Regression

Uploaded by

Hiyansh Chandel
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Simple Linear Regression

Uploaded by

Hiyansh Chandel
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 68

Simple Linear

Regression
Learning Objectives

In this session, you will learn:


▪ To use regression analysis to predict the value of a
dependent variable based on an independent variable
▪ The meaning of the regression coefficients b0 and b1
▪ To evaluate the assumptions of regression analysis and
know what to do if the assumptions are violated
▪ To make inferences about the slope and correlation
coefficient
▪ To estimate mean values and predict individual values
Correlation vs. Regression

▪ A scatter plot (or scatter diagram) can be used to


show the relationship between two numerical
variables
▪ Correlation analysis is used to measure strength of
the association (linear relationship) between two
variables
▪ Correlation is only concerned with strength of the
relationship
▪ No causal effect is implied with correlation
▪ Correlation was already done in Measures of
Tendency
Regression Analysis

Regression analysis is used to:


▪ Predict the value of a dependent variable based on the
value of at least one independent variable
▪ Explain the impact of changes in an independent variable
on the dependent variable
Dependent variable: the variable you wish to explain
Independent variable: the variable used to explain
the dependent variable
Simple Linear Regression
Model

▪ Only one independent variable, X


▪ Relationship between X and Y is described
by a linear function
▪ Changes in Y are related to changes in X
Types of Relationships
Linear relationships Curvilinear relationships
Y Y

X X

Y Y

X X
Types of Relationships
Strong relationships Weak relationships
Y Y

X X

Y Y

X X
Chap 13-7
Types of Relationships

No relationship X

X
The Linear Regression Model

Population Random
Population Independent Error
Slope
Y intercept Variable term
Coefficient
Dependent
Variable

Linear component Random Error


component

The population regression model:


The Linear Regression Model

Y
Observed Value
of Y for Xi

εi Slope = β1
Predicted Value Random Error
of Y for Xi for this Xi value

Intercept = β0

Xi X
Linear Regression Equation:
PREDICTION LINE
The simple linear regression equation provides an
estimate of the population regression line

Estimated (or
predicted) Y Estimate of the Estimate of the
value for regression regression slope
observation i intercept

Value of X for
observation i
The Least Squares Method

▪ b0 and b1 are obtained by finding the values of b0


and b1 that minimize the sum of the squared (SS)
differences between Y and :
SUNFLOWERS APPAREL DATA
Finding the Least Squares
Equation
▪ The coefficients b0 and b1 , and other regression
results are found in excel. Excel label b0 as
Intercept and b1 as Profiled Customer in Sunflower
example.

Formulas are shown in the text for


those who are interested
b0 = -1.2088
b1 = 2.0742

Using equation
= -1.2088 + 2.0742Xi
b
The slope, 1 , is +2.0742 This means that for each increase in 1 unit in
X, the predicted value of Y is estimated to increase by 2.0742 units. In
other words, for each increase od 1.0 million profiled customers within
30 minutes of the store, the predicated mean annual sales are to be
estimated to increase by $2.0742 million. So, slope represents the
▪ The Y intercept is -1.2088. The Y intercept
represents the predicated value of Y when X
= 0. Because of the number of the customers
of the store cannot be zero, this Y intercept
has little or no practical interpretation.
▪ Also, the Y intercept for this example outside
the range of the observed values of the X
variable, and therefore the interpretation of
the value of b0 should be made cautiously

Chap 13-16
Interpretation of the Intercept and
the Slope

▪ b0 is the estimated mean value of Y when the value


of X is zero
▪ b1 is the estimated change in the mean value of Y
for every one-unit change in X
Problem 2
Linear Regression Example

▪ A real estate agent wishes to examine the


relationship between the selling price of a home and
its size (measured in square feet)

▪ A random sample of 10 houses is selected


▪ Dependent variable (Y) = house price in $1000s
▪ Independent variable (X) = square feet
Linear Regression Example Data
House Price in $1000s Square Feet
(Y) (X)
245 1400
312 1600
279 1700
308 1875
199 1100
219 1550
405 2350
324 2450
319 1425
255 1700
Linear Regression Example
Scatterplot
▪ House price model: scatter plot
Linear Regression Example
Using Excel

Tools
--------
Data Analysis
--------
Regression
Linear Regression Example
Excel Output
Regression Statistics The regression equation is:
Multiple R 0.76211

R Square 0.58082

Adjusted R Square 0.52842

Standard Error 41.33032

Observations 10

ANOVA
df SS MS F Significance F

Regression 1 18934.9348 18934.9348 11.0848 0.01039

Residual 8 13665.5652 1708.1957

Total 9 32600.5000

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Linear Regression Example
Graphical Representation
▪ House price model: scatter plot and regression line

Slope
= 0.10977

Intercept
= 98.248
Linear Regression Example
Interpretation of b0

▪ b0 is the estimated mean value of Y when the value


of X is zero (if X = 0 is in the range of observed X
values)
▪ Because the square footage of the house cannot be 0,
the Y intercept has no practical application.
Linear Regression Example
Interpretation of b1

▪ b1 measures the mean change in the average value of


Y as a result of a one-unit change in X
▪ Here, b1 = .10977 tells us that the mean value of a
house increases by .10977($1000) = $109.77, on
average, for each additional one square foot of size
Linear Regression Example
Making Predictions
Predict the price for a house with 2000 square feet:

The predicted price for a house with 2000 square feet


is 317.85($1,000s) = $317,850
Linear Regression Example
Making Predictions
▪ When using a regression model for prediction, only
predict within the relevant range of data
Relevant range for
interpolation

Do not try to
extrapolate beyond
the range of
observed X’s

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.


2.

▪ Use the prediction line to predict the annual


sales for a store with 4 million profiled
customer.
Solution
3.

▪ Compute the Y intercept, b0, and the slope,


b1, for the Sunflowers Apparel Data.
Measures of Variation
Total variation is made up of two parts:

Total Sum of Regression Sum of Error Sum of


Squares Squares Squares

where:
= Mean value of the dependent variable
Yi = Observed values of the dependent variable
i
= Predicted value of Y for the given Xi value
Measures of Variation

▪ SST = total sum of squares


▪ Measures the variation of the Yi values around their
mean Y
▪ SSR = regression sum of squares
▪ Explained variation attributable to the relationship
between X and Y
▪ SSE = error sum of squares
▪ Variation attributable to factors other than the
relationship between X and Y
Measures of Variation
Y
Yi ∧ 2 ∧
SSE = ∑(Yi - Yi ) Y
_
SST = ∑(Yi - Y)2

Y ∧ _
_ SSR = ∑(Yi - Y)2 _
Y Y

Xi X
2
Coefficient of Determination, r
▪ The coefficient of determination is the portion of
the total variation in the dependent variable that
is explained by variation in the independent
variable
▪ The coefficient of determination is also called
r-squared and is denoted as r2
2
Coefficient of Determination, r
Y
r2 = 1

Perfect linear relationship between


X and Y:
X
2
r = -1
Y 100% of the variation in Y is
explained by variation in X

2 X
r =1
2
Coefficient of Determination, r
Y
0 < r2 < 1

Weaker linear relationships


between X and Y:
X
Some but not all of the variation
Y
in Y is explained by variation in
X

X
2
Coefficient of Determination, r

r2 = 0
Y
No linear relationship between X
and Y:

The value of Y is not related to X.


X (None of the variation in Y is
r2 = 0
explained by variation in X)
Linear Regression Example
2
Coefficient of Determination, r
Regression Statistics

Multiple R 0.76211

R Square 0.58082
58.08% of the variation in house
Adjusted R Square 0.52842 prices is explained by variation in
Standard Error 41.33032 square feet
Observations 10

ANOVA
df SS MS F Significance F

Regression 1 18934.9348 18934.9348 11.0848 0.01039

Residual 8 13665.5652 1708.1957

Total 9 32600.5000

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Linear Regression Example
Standard Error of Estimate
Regression Statistics

Multiple R 0.76211

R Square 0.58082

Adjusted R Square 0.52842

Standard Error 41.33032

Observations 10

ANOVA
df SS MS F Significance F

Regression 1 18934.9348 18934.9348 11.0848 0.01039

Residual 8 13665.5652 1708.1957

Total 9 32600.5000

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


4.

▪ Compute the coefficient of determination r2,


for the Sunflowers Apparel data.
Measuring Autocorrelation:
The Durbin-Watson Statistic
▪ Used when data are collected over time to
detect if autocorrelation is present
▪ Autocorrelation exists if residuals in one
time period are related to residuals in
another period
Autocorrelation
▪ Autocorrelation is correlation of the errors
(residuals) over time

▪ Here, residuals suggest a


cyclic pattern, not
random

▪ Violates the regression assumption that residuals are


statistically independent
The Durbin-Watson Statistic
▪ The Durbin-Watson statistic is used to test for autocorrelation

H0: residuals are not correlated


H1: autocorrelation is present

▪ The possible range is 0 ≤ D ≤ 4

▪ D should be close to 2 if H0 is true


▪ D less than 2 may signal positive
autocorrelation, D greater than 2 may
signal negative autocorrelation
The Durbin-Watson Statistic
H0: positive autocorrelation does not exist
H1: positive autocorrelation is present
▪ Calculate the Durbin-Watson test statistic = D
(The Durbin-Watson Statistic can be found using Excel)

▪ Find the values dL and dU from the Durbin-Watson table


(for sample size n and number of independent variables k)

Decision rule: reject H0 if D < dL

Reject H0 Inconclusive Do not reject H0

0 dL dU 2
The Durbin-Watson Statistic
▪ Example with n = 25:
Excel output:
Durbin-Watson Calculations
Sum of Squared
Difference of Residuals 3296.18
Sum of Squared Residuals 3279.98
Durbin-Watson Statistic 1.00494
The Durbin-Watson Statistic
▪ Here, n = 25 and there is k = 1 independent variable
▪ Using the Durbin-Watson table, dL = 1.29 and dU = 1.45
▪ D = 1.00494 < dL = 1.29, so reject H0 and conclude that
significant positive autocorrelation exists
▪ Therefore the linear model is not the appropriate model to
predict sales
Decision: reject H0 since
D = 1.00494 < dL

Reject H0 Inconclusive Do not reject H0


0 dL=1.29 dU=1.45 2
Inferences About the Slope:
t Test
▪ t test for a population slope
▪ Is there a linear relationship between X and Y?
▪ Null and alternative hypotheses
▪ H0: β1 = 0 (no linear relationship)
▪ H1: β1 ≠ 0 (linear relationship does exist)
▪ Test statistic where:
b1 = regression slope
coefficient
β1 = hypothesized slope
Sb1 = standard
error of the slope
Inferences About the Slope:
t Test Example
House Price in Estimated Regression Equation:
Square Feet
$1000s
(x)
(y)
245 1400
312 1600
279 1700
308 1875 The slope of this model is 0.1098
199 1100
219 1550
Is there a relationship between the
405 2350
square footage of the house and its
324 2450 sales price?
319 1425
255 1700
Inferences About the Slope:
t Test Example
▪ H0: β1 = 0 From Excel output: b1
▪ H1: β1 ≠ 0 Coefficients Standard Error t Stat P-value
Intercept 98.24833 58.03348 1.69296 0.12892
Square Feet 0.10977 0.03297 3.32938 0.01039

t
Inferences About the Slope:
t Test Example
Test Statistic: t = 3.329 ▪ H0: β1 = 0
▪ H1: β1 ≠ 0

d.f. = 10- 2 = 8

α/2=.025 α/2=.025
Decision: Reject H0

There is sufficient evidence


Reject H0 Do not reject H0
-tα/2 tα/2
Reject H0 that square footage affects
0
-2.3060 2.3060 3.329 house price
Inferences About the Slope:
t Test Example
From Excel output: P-Value
Coefficients Standard Error t Stat P-value
Intercept 98.24833 58.03348 1.69296 0.12892

▪ H0: β1 = 0
Square Feet 0.10977 0.03297 3.32938 0.01039

▪ H1: β1 ≠ 0
Decision: Reject H0, since p-value < α

There is sufficient evidence that


square footage affects house price.
F-Test for Significance
▪ F Test statistic:

where

where F follows an F distribution with k numerator degrees of freedom


and (n - k - 1) denominator degrees of freedom

(k = the number of independent variables in the regression model)


F-Test for Significance
Excel Output
Regression Statistics

Multiple R 0.76211

R Square 0.58082

Adjusted R Square 0.52842


With 1 and 8 degrees P-value for
Standard Error 41.33032 of freedom the F-Test
Observations 10

ANOVA
df SS MS F Significance F

Regression 1 18934.9348 18934.9348 11.0848 0.01039

Residual 8 13665.5652 1708.1957

Total 9 32600.5000
F-Test for Significance
▪ H0: β1 = 0 Test Statistic:
▪ H1: β1 ≠ 0
▪ α = .05
▪ df1= 1 df2 = 8
Decision:
Critical Value: Reject H0 at α = 0.05
Fα = 5.32
Conclusion:
α = .05
There is sufficient evidence that
0 Do not Reject H0
F house size affects selling price
reject H0 F.05 = 5.32
Confidence Interval Estimate
for the Slope
Confidence Interval Estimate of the Slope:
d.f. = n - 2

Excel Printout for House Prices:


Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

At the 95% level of confidence, the confidence interval


for the slope is (0.0337, 0.1858)
Confidence Interval Estimate
for the Slope
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

Since the units of the house price variable is $1000s, you


are 95% confident that the mean change in sales price is
between $33.74 and $185.80 per square foot of house size

This 95% confidence interval does not include 0.


Conclusion: There is a significant relationship between house price and
square feet at the .05 level of significance
t Test for a Correlation Coefficient
▪ Hypotheses
▪ H0: ρ = 0 (no correlation between X and Y)
▪ H1: ρ ≠ 0 (correlation exists)

▪ Test statistic
(with n – 2 degrees of freedom)
t Test for a Correlation Coefficient

Is there evidence of a linear relationship between


square feet and house price at the .05 level of
significance?
H0: ρ = 0 (No correlation)
H1: ρ ≠ 0 (correlation exists)
α =.05 , df = 10 - 2 = 8
t Test for a Correlation Coefficient

d.f. = 10- 2 = 8
Decision:
Reject H0
α/2=.025 α/2=.025
Conclusion:
There is evidence
Reject H0 Do not reject H0 Reject H0
of a linear
-tα/2 tα/2 association at the
0
-2.3060 2.3060 5% level of
3.329
significance

You might also like