Simple Linear Regression
Simple Linear Regression
Regression
Learning Objectives
X X
Y Y
X X
Types of Relationships
Strong relationships Weak relationships
Y Y
X X
Y Y
X X
Chap 13-7
Types of Relationships
No relationship X
X
The Linear Regression Model
Population Random
Population Independent Error
Slope
Y intercept Variable term
Coefficient
Dependent
Variable
Y
Observed Value
of Y for Xi
εi Slope = β1
Predicted Value Random Error
of Y for Xi for this Xi value
Intercept = β0
Xi X
Linear Regression Equation:
PREDICTION LINE
The simple linear regression equation provides an
estimate of the population regression line
Estimated (or
predicted) Y Estimate of the Estimate of the
value for regression regression slope
observation i intercept
Value of X for
observation i
The Least Squares Method
Using equation
= -1.2088 + 2.0742Xi
b
The slope, 1 , is +2.0742 This means that for each increase in 1 unit in
X, the predicted value of Y is estimated to increase by 2.0742 units. In
other words, for each increase od 1.0 million profiled customers within
30 minutes of the store, the predicated mean annual sales are to be
estimated to increase by $2.0742 million. So, slope represents the
▪ The Y intercept is -1.2088. The Y intercept
represents the predicated value of Y when X
= 0. Because of the number of the customers
of the store cannot be zero, this Y intercept
has little or no practical interpretation.
▪ Also, the Y intercept for this example outside
the range of the observed values of the X
variable, and therefore the interpretation of
the value of b0 should be made cautiously
Chap 13-16
Interpretation of the Intercept and
the Slope
Tools
--------
Data Analysis
--------
Regression
Linear Regression Example
Excel Output
Regression Statistics The regression equation is:
Multiple R 0.76211
R Square 0.58082
Observations 10
ANOVA
df SS MS F Significance F
Total 9 32600.5000
Slope
= 0.10977
Intercept
= 98.248
Linear Regression Example
Interpretation of b0
Do not try to
extrapolate beyond
the range of
observed X’s
where:
= Mean value of the dependent variable
Yi = Observed values of the dependent variable
i
= Predicted value of Y for the given Xi value
Measures of Variation
Xi X
2
Coefficient of Determination, r
▪ The coefficient of determination is the portion of
the total variation in the dependent variable that
is explained by variation in the independent
variable
▪ The coefficient of determination is also called
r-squared and is denoted as r2
2
Coefficient of Determination, r
Y
r2 = 1
2 X
r =1
2
Coefficient of Determination, r
Y
0 < r2 < 1
X
2
Coefficient of Determination, r
r2 = 0
Y
No linear relationship between X
and Y:
Multiple R 0.76211
R Square 0.58082
58.08% of the variation in house
Adjusted R Square 0.52842 prices is explained by variation in
Standard Error 41.33032 square feet
Observations 10
ANOVA
df SS MS F Significance F
Total 9 32600.5000
Multiple R 0.76211
R Square 0.58082
Observations 10
ANOVA
df SS MS F Significance F
Total 9 32600.5000
0 dL dU 2
The Durbin-Watson Statistic
▪ Example with n = 25:
Excel output:
Durbin-Watson Calculations
Sum of Squared
Difference of Residuals 3296.18
Sum of Squared Residuals 3279.98
Durbin-Watson Statistic 1.00494
The Durbin-Watson Statistic
▪ Here, n = 25 and there is k = 1 independent variable
▪ Using the Durbin-Watson table, dL = 1.29 and dU = 1.45
▪ D = 1.00494 < dL = 1.29, so reject H0 and conclude that
significant positive autocorrelation exists
▪ Therefore the linear model is not the appropriate model to
predict sales
Decision: reject H0 since
D = 1.00494 < dL
t
Inferences About the Slope:
t Test Example
Test Statistic: t = 3.329 ▪ H0: β1 = 0
▪ H1: β1 ≠ 0
d.f. = 10- 2 = 8
α/2=.025 α/2=.025
Decision: Reject H0
▪ H0: β1 = 0
Square Feet 0.10977 0.03297 3.32938 0.01039
▪ H1: β1 ≠ 0
Decision: Reject H0, since p-value < α
where
Multiple R 0.76211
R Square 0.58082
ANOVA
df SS MS F Significance F
Total 9 32600.5000
F-Test for Significance
▪ H0: β1 = 0 Test Statistic:
▪ H1: β1 ≠ 0
▪ α = .05
▪ df1= 1 df2 = 8
Decision:
Critical Value: Reject H0 at α = 0.05
Fα = 5.32
Conclusion:
α = .05
There is sufficient evidence that
0 Do not Reject H0
F house size affects selling price
reject H0 F.05 = 5.32
Confidence Interval Estimate
for the Slope
Confidence Interval Estimate of the Slope:
d.f. = n - 2
▪ Test statistic
(with n – 2 degrees of freedom)
t Test for a Correlation Coefficient
d.f. = 10- 2 = 8
Decision:
Reject H0
α/2=.025 α/2=.025
Conclusion:
There is evidence
Reject H0 Do not reject H0 Reject H0
of a linear
-tα/2 tα/2 association at the
0
-2.3060 2.3060 5% level of
3.329
significance