Simple Regression Model CH02
Simple Regression Model CH02
Δ y = β 1 Δ x If Δ u = 0 2.2
variables .
due to u and x is a random variable that we can define at any x below the
value of u conditional distribution. In particular, for
2.6 any x , we can get u
The expected value ( or average ) of . The important assumption is u
The average of does not depend on x value ( independent of x ) .
E( u | x ) =E( u ) = 0
yi 0 1 xi ui 2.9
Eu 0 2.10
and
n
n 1 xi yi ˆ0 ˆ1 xi 0
and
2.15
i 1
1 n
yn i 1
yi x y
where is yi Thê 0sample
̂1 yaverage
x of The definition is also similar
to . This equation allows us
ˆ to ˆ, x, To represent:
yuse 0 1 2.17
Let (2.15) be n -1 Eliminate ( because it does not affect the result ) , and
substitute equation (2.17) into
n equation (2.15) to get
i i
x y y
ˆ x ˆ x 0
1 1 i
i 1
x x x x x x y y xi x yi y
2
i i i 及 i i
i 1 i 1 i 1 i 1
Therefore, under the following conditions
n
x x 0
i 1
i
2.18
x x yi i y
̂1 i 1
n 2.19
x x
2
i
i 1
(2.19) is just x i and y i sample covariance divided by x sample
variation . ˆ1 0 ˆ1 0
If x i and y i There is a positive correlation rule in the sample ;if
x i and y i It is a negative correlation rule .
CH2 simple regression model Figure 2.3 on page 32
The estimates in (2.17) and (2.19) are called ordinary least squares (OLS)
estimates of β 0 and β 1 .
̂ 0 define
any and, ̂1 a y when x = x The fitted value of
i
the real y i and its fitness value is the observation value i Residual : _ _
n n
i 0 1i
2
ˆ
u y
2 ˆ ˆ x
2.22
i
i 1 i 1
Minimization.
After determining the intercept and slope of the OLS estimate, the
OLS regression line can be obtained :
yˆ ˆby
given̂ 0and̂,1 both are on the regression line estimated 0 ˆx .
OLS
1
, then the y value is underestimated.
, then the y value is overestimated.
2-3b Algebraic properties of OLS statistics
OLS estimates and their associated statistics have some useful algebraic
properties. Now come up with the three most important ones.
(1) The sum of OLS residuals and the sample average is 0 . Mathematically
n
uˆ
i 1
i 0 2.30
yi yˆ i uˆi 2.32
The total sum of squares (SST) , explained sum of squares (SSE) and
residual sum of squares (SSR) are defined as follows:
n
SST yi y
2
2.33
i 1
n
SSE yˆ i y
2
2.34
i 1
n
SSR uˆi2 2.35
i 1
uˆ yˆ
i 1
i i y 0
Assume that the total sum of squares SST is not equal to 0 - except for all y i Unless
the values of are all equal , this must be true - we can divide equation (2.36) by SST
to get 1 = SSE/SST + SSR/SST . Return of R 2 Sometimes called the coefficient of
determination , it is defined as
Since SSE will not be greater than SST , R 2 must be between 0 and 1 .
When interpreting R2 , we usually multiply it by 100 to convert it to a percentage :
y 0 1 x u 2.47
where β 0 and β 1 are the intercept and slope parameters ( linear parameters ) of the matrix .
E u x 0
n n n
x x x x x x x u
i 1
i 0
i 1
i 1 i
i 1
i i
2.51
n n n
0 xi x 1 xi x xi xi x ui
i 1 i 1 i 1
Andand.
n
x x 0
i 1 i
n
i 1 ix x xi
i 1 i
n
x x 2
SSTx
x x u
i i n
ˆ1 1 i 1
1 1 SSTx d i ui 2.52
SSTx i 1
̂ 0
In other words, ̂1 of β 0 , and is the
is the unbiased estimator
unbiased estimator of β 1 .
or
uˆi ui ˆ0 0 ˆ1 ˆ1 xi 2.59
ˆ ˆ 2 2.62
This is called the standard error of the regression (SER) .
Sincesd ˆ estimator
a natural
1 x
SST ,sd ˆ is
1
12
2
n
se ˆ1 ˆ SSTx ˆ xi x
i 1
̂1
This is called standard error of ) . ̂1
~ ~
1 y
~
Among them in and The symbols
y 0 above are used to distinguish
estimators where slope and intercept exist at the same time.
Since equation (2.63) passes x = 0 and , called regression
through the origin .
yi 1 xi
i 1
2.64
~
1
It can be proved using calculus Must be a solution
2.65 to the first-
n
order condition : ~
xi yi 1 xi 0
i 1
~ x y i i
1 i 1
n 2.66
x
i 1
2
i
R 2 may be negative
y 0 1 x u
E(y | x = 0) 0 2.70
E(y | x = 1) 0 1 2.71