06 Least Squar Regression
06 Least Squar Regression
House
Cost
bo ut
sa
o st
c
se t.
ho u o Size)
n g a e fo + 75(
d i a r
Buil er squ 25000
p =
$75 e cost
Most lots sell s
H ou
for $25,000
House size
3
The Model
However, house cost vary even among same size
houses! Since cost behave unpredictably,
House we add a random component.
Cost
4
The Model
• The first order linear model
yy 00 11xx
y = dependent variable 0 and 1 are unknown population
x = independent variable y parameters, therefore are estimated
from the data.
0 = y-intercept
1 = slope of the line
Rise = Rise/Run
= error variable 0 Run
x
5
Estimating the Coefficients
7
The Least Squares (Regression) Line
Sum of squared differences = (2 - 1)2 + (4 - 2)2 + (1.5 - 3)2 + (3.2 - 4)2 = 6.89
Sum of squared differences = (2 -2.5)2 + (4 - 2.5)2 + (1.5 - 2.5)2 + (3.2 - 2.5)2 = 3.99
(2,4)
Let us compare two lines
4
The second line is horizontal
3 (4,3.2)
2.5
2
(1,2)
(3,1.5)
1 The smaller the sum of
squared differences
the better the fit of the
1 2 3 4
line to the data.
8
The Estimated Coefficients
Alternate formula for the slope b1
To calculate the estimates of the slope
and intercept of the least squares line , ssyy
use the formulas: bb11rr s
sxx
SS xy
SS
bb11
xy
SS xx The regression equation that estimates
SS xx
the equation of the first order linear
bb00 yybb11xx model
xx yy is:
SS xy
SS xxyy
ii ii
ii ii
nn
xy
xx
22 ŷŷ bb00 bb11xx
SS xx
SS 22
xx n ((nn1)1)ss
ii 22
xx ii xx
n
9
The Simple Linear Regression Line
• Example:
– A car dealer wants to find
the relationship between
Car Odometer Price
the odometer reading and 1 37388 14636
the selling price of used cars. 2 44758 14122
3 45833 14016
– A random sample of 100 4 30862 15590
cars is selected, and the data 5 31705 15568
6 34010 14718
. . .
recorded. .
Independent
.
variable
Dependent
.
x variable y
– Find the regression line. . . .
10
The Simple Linear Regression Line
• Solution
– Solving by hand: Calculate a number of statistics
x
2
x 36,009.45; SS xx xi
i
2
43, 528, 690
n
y 14,822.823; SS xy ( xi yi )
xi yi
2, 712, 511
n
where n = 100.
SS xy 2, 712,511
b1 .06232
(n 1) sx2 43,528, 690
b0 y b1 x 14,822.82 (.06232)(36, 009.45) 17, 067
1. Scatterplot
2. Trend function
3. Tools > Data Analysis > Regression
12
The Simple Linear Regression Line
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.8063
R Square 0.6501
Adjusted R Square
0.6466
Standard Error 303.1
Observations 100 yˆ 17,067 .0623x
ANOVA
df SS MS F Significance F
Regression 1 16734111 16734111 182.11 0.0000
Residual 98 9005450 91892
Total 99 25739561
13
Interpreting the Linear Regression -
Equation
17067 Odometer Line Fit Plot
16000
15000
Price
14000
0 No data 13000
Odometer
yˆ 17,067 .0623x
0 + 1x1
From the
From the first
first three
three assumptions
assumptions wewe have:
have:
yy isis normally
normally distributed
distributed with
with mean
mean x1 x2 x3
E(y) == 00 ++ 11x,
E(y) x, and
and aa constant
constant standard
standard
deviation
deviation
16
Assessing the Model
• The least squares method will produces a
regression line whether or not there is a linear
relationship between x and y.
• Consequently, it is important to assess how well
the linear model fits the data.
• Several methods are used to assess the model.
All are based on the sum of squares for errors,
SSE.
17
Sum of Squares for Errors
– This is the sum of differences between the points
and the regression line.
– It can serve as a measure of how well the line fits the
data. SSE is defined by
nn
SSE
SSE
( y ŷ 22 .
)
( y i i ŷ i )i .
i i 11
– A shortcut formula
SStan
tandard
dard Error
Error of
of Estimate
Estimate
SSE
SSE
ss
nn22 19
Standard Error of Estimate,
Example
• Example:
– Calculate the standard error of estimate for the previous
example and describe what it tells you about the model fit.
• Solution