Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
33 views

Simple Regression Model CH02

This document defines and describes the simple linear regression model. 1) A simple linear regression model represents the relationship between a dependent variable y and independent variable x as y = β0 + β1x + u, where β0 is the intercept term, β1 is the slope parameter, and u is the error term. 2) The ordinary least squares (OLS) method is used to estimate the parameters β0 and β1. This involves minimizing the sum of squared residuals to find the estimates β̂0 and β̂1. 3) The OLS regression line is given by ŷ = β̂0 + β̂1x, which provides the best fitting line

Uploaded by

sd7nq4r7mr
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

Simple Regression Model CH02

This document defines and describes the simple linear regression model. 1) A simple linear regression model represents the relationship between a dependent variable y and independent variable x as y = β0 + β1x + u, where β0 is the intercept term, β1 is the slope parameter, and u is the error term. 2) The ordinary least squares (OLS) method is used to estimate the parameters β0 and β1. This involves minimizing the sum of squared residuals to find the estimates β̂0 and β̂1. 3) The OLS regression line is given by ŷ = β̂0 + β̂1x, which provides the best fitting line

Uploaded by

sd7nq4r7mr
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 60

Chapter 2

simple regression model


2-1 Definition of simple linear regression model

 of simple linear regression is :


y = β 0+ β 1x + u 2.1

 Suppose we are interested in " how y changes when x changes "


 If (2.1) is true, it defines a simple linear regression model .

 Because it represents the relationship between variables x and y , it can also be

called a two-variable linear regression model or a binary linear regression


model .
 In formula (2.1) , variables y and x often use several different names alternately.
Simple regression terminology

 In econometrics, "contingent variables" and "independent variables" are


very commonly used. However, it should be noted that the English word
"independent " here does not represent the statistical concept of
independence between random variables.
 Variable u : called error term or disturbance in the relationship , it
represents the Other than that will affect y the elements of .
 Simple regression analysis will divide x All effects other than y
factors are considered unobservable .
 You can put u Think of it as representing "unobservables" .
 (2.1) also points out that y and x The issue of functional relationship. If other factors in u
are fixed , then u The change of is 0 , Δ u = 0 , then x to y There is a linear effect :

Δ y = β 1 Δ x If Δ u = 0 2.2

 So the change in y is just β 1 times x changes .


 This represents among other factors u Under constant conditions , y and x The slope
parameter of the relationship is β 1 ; It is the part of the applied economy that interests us
the most .
 The intercept parameter β 0 , sometimes called the constant term , also plays a role, but it
is not the most important in the analysis .

 Example: 2.1 and 2.2 on Page 25.


 Given unobservable u and the explanatory variable x Only under the assumed
relationship can β 0 in a random sample be obtained and β 1 The estimated formula.
Without this assumption, we cannot estimate the ceteris paribus effect, β 1 .
 due to u and x is a random variable. Statement x and u How to relate the hypothesis
before, there is one about u The assumption is something we can always make. As
long as the intercept term β 0 included in the equation
2.5 , it is always possible to
assume that u The parent average is 0 .
E( u ) = 0
 Return to u and x How to relate important assumptions. A measure of
the relationship between two random variables is the correlation
coefficient .
 Correlation coefficient : measures the linear relationship between

variables .
 due to u and x is a random variable that we can define at any x below the
value of u conditional distribution. In particular, for
2.6 any x , we can get u
The expected value ( or average ) of . The important assumption is u
The average of does not depend on x value ( independent of x ) .
E( u | x ) =E( u ) = 0

The second equation is equation (2.5) .


The first equation E( u | x ) in the formula ( 2.6) is a new assumption, which
we call the zero conditional mean assumption .
 Put (2.1) in x Get the conditional expectation value under the
conditions of , and use E( u | x ) = 0 available
2.8
E( y | x ) = β 0 + β 1 x

 (2.8) shows the population regression function (PRF) , E( y | x ) , which


is x linear function.
 Call β 0 + β 1 x the systematic part of y ( u is the non-systematic part )
 ex : Page 27
2-2 Derive the ordinary least squares method
estimate
 First, there must be a sample obtained from the mother body.
 Let {( x i, y i ): i =1,…, n } represents a size n obtained from the matrix of
random samples. Since these data come from equation (2.1) , we can write
out

yi   0  1 xi  ui 2.9

 Since u i contains all influencing factors of yi except x i , it is the error term


of observation value i .
Example: Page 28.
 In the mother body, u The mean is 0 and sums x Not relevant. Therefore, we
see u The expected value is 0 and x and u The common variation is 0 :

Eu   0 2.10
and

Covx, u   Exu   0 2.11


 Using observable variables x and y and unknown parameter β 0 and β 1 ,
(2.10) and (2.11) can be written as
E  y   0  1 x   0 2.12

and 𝑬 [ 𝒙( 𝒚 − 𝜷𝟎 − 𝜷 𝟏 𝒙)]=𝟎 2.13


 In a certain data sample, we choose ̂and
0 ̂Solve
1 the sample correspondence
between equation (2.12) and equation (2.13)
 
n
n 1  yi  ˆ0  ˆ1 xi  0 2.14
i 1

 
n
n 1  xi yi  ˆ0  ˆ1 xi  0
and
2.15
i 1

This is an example of the method of moments of estimation .


 Using the basic properties of additive factors , equation (2.14)
can be rewritten as
y  ˆ0  ˆ1 x 2.16


1 n
yn i 1
yi x y
 where is yi The̂ 0sample
̂1 yaverage
x of The definition is also similar
to . This equation allows  us
ˆ to  ˆ, x, To represent:
 yuse 0 1 2.17
 Let (2.15) be n -1 Eliminate ( because it does not affect the result ) , and
substitute equation (2.17) into
n equation (2.15) to get

 i i
x y  y   
ˆ x  ˆ x  0
1 1 i  
i 1

 After the transfer and reorganization,


n
we can getn
 x y
i 1
i i  y   ̂1  xi xi  x 
i 1

 Reuse the basic properties of additive factors


n n n n

 x x  x    x  x   x y  y    xi  x  yi  y 
2
i i i 及 i i
i 1 i 1 i 1 i 1
 Therefore, under the following conditions
n

 x  x   0
i 1
i
2.18

 The estimated slope is


n

 x  x  yi i  y
̂1  i 1
n 2.19
 x  x 
2
i
i 1
 (2.19) is just x i and y i sample covariance divided by x sample
variation . ˆ1  0 ˆ1  0
 If x i and y i There is a positive correlation rule in the sample ;if
x i and y i It is a negative correlation rule .
CH2 simple regression model Figure 2.3 on page 32
 The estimates in (2.17) and (2.19) are called ordinary least squares (OLS)
estimates of β 0 and β 1 .
 ̂ 0 define
any and, ̂1 a y when x = x The fitted value of
i

yˆ i  ˆ0  ˆ1 xi 2.20

 the real y i and its fitness value is the observation value i Residual : _ _

uˆi  yi  yˆ i  yi  ˆ0  ˆ1 xi 2.21

CH2 simple regression model page 32


ordinary least squares estimate

CH2 simple regression model Figure 2.4 on page 33


 Suppose we choose
̂ 0 ̂1 and so that the sum of squared residuals

 
n n

  i 0 1i
2
ˆ
u  y 
2 ˆ  ˆ x
 2.22
i
i 1 i 1
Minimization.
 After determining the intercept and slope of the OLS estimate, the
OLS regression line can be obtained :

yˆ  ˆ0  ˆ1 x 2.23


 Since (2.23 ) yˆ  ˆ0  ˆ1 x
 E  y x    0 function
is the maternal regression 1 x The estimated
version of is also called the sample regression function (SRF) .
 We should remember that PRF is fixed and unknown in the mother
.

CH2 simple regression model page 34


 The slope estimate in most cases can be written as

ˆ1  yˆ x 2.24


 It tells us when x When changing by one unit The amount of change.

yˆ  ˆ1x 2.25

 So at x Under any change ( whether positive or negative ) , we can calculate y


forecast changes.

CH2 simple regression model page 34


CH2 Simple Regression Model Figure 2.5 on
2-3 OLS characteristics of arbitrary data samples

2-3a OLS fitting values and residuals

 yˆ  ˆby
given̂ 0and̂,1 both are on the regression line estimated 0  ˆx .

OLS
1
 , then the y value is underestimated.
 , then the y value is overestimated.
2-3b Algebraic properties of OLS statistics

 OLS estimates and their associated statistics have some useful algebraic
properties. Now come up with the three most important ones.
(1) The sum of OLS residuals and the sample average is 0 . Mathematically
n

 uˆ
i 1
i 0 2.30

(2) The sample common variation of the independent variables and


OLS residuals is 0 . This characteristic comes from the first-order
condition of equation (2.15) , nwhich can be expressed by the residual
 x uˆ
i 1
i i 0 2.31

CH2 simple regression model pages 38-39


x, yalways
(3) It will  be on the OLS regression line. In other words, if we
x
use in (2.23) y predicted value is .
Replace x , then the OLS

yi  yˆ i  uˆi 2.32

CH2 simple regression model page 40


Fit values and residuals

CH2 simple regression model Table 2.2 on page 39


2-3b Algebraic properties of OLS statistics

 The total sum of squares (SST) , explained sum of squares (SSE) and
residual sum of squares (SSR) are defined as follows:

n
SST    yi  y 
2
2.33
i 1
n
SSE    yˆ i  y 
2
2.34
i 1
n
SSR   uˆi2 2.35
i 1

CH2 simple regression model page 40


 y can be expressed as the sum of the explained variability SSE and the
unexplained variability SSR . Therefore
2.36
SST = SSE + SSR

 If we can prove equation (2.37) , then equation (2.36) can2.37


be established.
n

 uˆ  yˆ
i 1
i i  y  0

CH2 simple regression model pages 40-41


2-3c is appropriate

 Assume that the total sum of squares SST is not equal to 0 - except for all y i Unless
the values of are all equal , this must be true - we can divide equation (2.36) by SST
to get 1 = SSE/SST + SSR/SST . Return of R 2 Sometimes called the coefficient of
determination , it is defined as

R  SSE SST  1  SSR SST


2
2.38

 Since SSE will not be greater than SST , R 2 must be between 0 and 1 .
 When interpreting R2 , we usually multiply it by 100 to convert it to a percentage :

100 . R 2 is y The sample variation can be determined by x Percent explained .

CH2 simple regression model pages 41-42


2-4 Units of measurement and functional forms

2-4a Measures the effect of unit change on OLS statistics

CH2 Simple Regression Model No. 43


2-4b Simple regression adds nonlinearity

CH2 Simple Regression Model Figure 2.6 on page 45


CH2 simple regression model Table 2.3 on
2-4C The meaning of linear regression

 Linear regression depends on the definition of its regression parameters (


that is, the parameters are linear ) .
 simple regression does not depend on how y and x are defined , but the
interpretation of the coefficients depends on how y and x are defined.
2-5 Expected value and variation of OLS estimation
formula

Assuming SLR.1 parametric linearity


 In the parent model, the strain number y is related to the independent variable x , and the
error term ( or interference term ) u is

y   0  1 x  u 2.47

where β 0 and β 1 are the intercept and slope parameters ( linear parameters ) of the matrix .

CH2 simple regression model page 48


Assuming SLR.2 random sampling
 From the parent model of equation (2.47) , a random sample of
size n can be obtained {(x i , y i ) : i =1,2,...,n} .

CH2 simple regression model page 49


The impartiality of OLS

CH2 simple regression model Figure 2.7 on page 49


hypothesis SLR.3 Sample variability of
explanatory variables

 The sample results for x , { x i , i = 1, ..., n } , do not all have the


same value .

CH2 simple regression model page 50


Assuming SLR.4 The condition average is 0
 Under any given value of the explanatory variable, the error term u
The expected value is 0 . in other words,

E u x   0

CH2 simple regression model page 50


The impartiality of OLS

n n n

 x  x    x  x  x   x  x u
i 1
i 0
i 1
i 1 i
i 1
i i
2.51
n n n
  0  xi  x   1  xi  x xi   xi  x ui
i 1 i 1 i 1

 Andand.
n
x  x   0
i 1 i
n

i 1 ix  x xi  
i 1 i
n
x  x 2
 SSTx

CH2 simple regression model page 51


i 1 xiti above
x ui the
n
 Therefore, we can̂write
1
̂1SST
the numerator ofx .Write
denominator to get
n

 x  x u
i i n
ˆ1  1  i 1
 1  1 SSTx  d i ui 2.52
SSTx i 1

CH2 simple regression model page 51


Theorem 2.1 The impartiality of OLS

Utilize assumptions SLR.1 to SLR.4


 For any β 0 and β 1
 
E ˆ0   0 及 E ˆ1  1   2.53

 ̂ 0
In other words, ̂1 of β 0 , and is the
is the unbiased estimator
unbiased estimator of β 1 .

CH2 simple regression model page 52


Assuming homogeneous variability in SLR.5

 Under any given value of the explanatory variable, the error


term u have the same number of mutations. in other words,
Var u x    2

CH2 simple regression model page 54


Variation of OLS estimation formula
 Since Var ( u|x ) = E( u 2 |x )  [E( u|x )] 2 , and E( u | x ) = 0 , it represents σ 2 = E( u 2
|x ) , also for u 2 The unconditional expected value . _ _
 Since E( u ) = 0 ; therefore, σ 2 = E( u 2 ) = Var ( u ) .
 In other words, σ 2 for u The unconditional variation of σ 2 is therefore called the error
variance or the interference item variation.
 σ of σ 2 is the standard deviation of the error.

CH2 simple regression model page 55


 We can use y Write the assumptions SLR.4 and SLR.5 in the form of
conditional mean and conditional variation (Page 55)
E y x    0  1 x 2.55
Var y x    2 2.56

 When Var ( u|x ) depends on the x error term , there is


heteroskedasticity or a non - constant variability . Since Var ( u|x ) = Var
( y|x ) , heterogeneous variability exists whenever Var ( y|x ) is a
function of x .

CH2 simple regression model page 55


CH2 Simple Regression Model Figure 2.8
CH2 Simple Regression Model Figure 2.9 on page 56
2.5c estimated error variation
 Using equations (2.32) and (2.48) , we can write the residual as a
function of the error
uˆi  yi  ˆ0  ˆ1 xi   0  1 xi  ui   ˆ0  ˆ1 xi

or  
uˆi  ui  ˆ0   0  ˆ1  ˆ1 xi  2.59

 σ 2 , we have adjusted the degrees


n of freedom (n-2)2.61
:
1
ˆ 
2
 ˆ
u 2
 SSR n  2 
n  2 i 1 i

CH2 simple regression model pages 58-59


2-5c Estimated error variation

 
ˆ ˆ 2 2.62
This is called the standard error of the regression (SER) .
 Sincesd ˆ   estimator
a natural
1 x  
SST ,sd ˆ is
1

12

   2
n
se ˆ1  ˆ SSTx  ˆ   xi  x  
 i 1 
̂1
This is called standard error of ) . ̂1

CH2 simple regression model page 60


2-6 Regression through the origin and with constant
terms as independent variables
~
 We now choose a slope estimator1 called , and its regression line
is in the form ~ ~
y  1 x 2.63

~ ~
1 y
~
Among them in and The symbols
y 0 above are used to distinguish
estimators where slope and intercept exist at the same time.
Since equation (2.63) passes x = 0 and , called regression
through the origin .

CH2 simple regression model page 61


 To obtain the slope estimate of equation (2.63) , we still need to
rely on the ordinary least squares method, that is, minimizing
the sum of squares of the residuals
~ 2 :
 
n

 yi  1 xi
i 1
2.64

~
1
 It can be proved using calculus Must be a solution
2.65 to the first-
 
n
order condition : ~
 xi yi  1 xi  0
i 1

CH2 simple regression model page 61


~
 It can be understood
1 from this Under the condition that
not all x i is 0 : n

~ x y i i
1  i 1
n 2.66
x
i 1
2
i

 R 2 may be negative

CH2 simple regression model page 61


2-7 Regression on binary explanatory variables

 Simple regression can also be applied when x is a binary variable


(binary variable) When , this is usually called a dummy variable in
regression analysis . As the name "binary variable" suggests, the
values of x are only zero and 1 . These two values are used to
divide each unit in the matrix into two groups: x = 0 or x = 1 .

y   0  1 x  u
E(y | x = 0)   0 2.70

E(y | x = 1)   0  1 2.71

CH2 simple regression model pages 62-63


 available immediately

1 =E(y | x = 1)  E(y | x = 0) 2.72

CH2 simple regression model page 63


2-7a Opposite results, causal relationships and policy
analysis
 Having explained the concept of binary independent variables, now
is a good time to present a formal framework for the opposite ( or
potential ) consequences briefly mentioned in Chapter 1 . Especially
to define causal effect or treatment effect .
 In the simplest case, suppose we are evaluating two scenarios for an
intervention or policy: someone is affected by the intervention or
someone is not. In other words, those who are not exposed to the
intervention or new policy are the control group , while those who
are exposed to the intervention are the treatment group .

CH2 simple regression model page 64


 We have no way to estimate each unit i te i . _ Usually we focus on
the average treatment effect (ATE) , which is also called the
average causal effect (ACE) . ATE is the average test effect of the
entire matrix. ( Sometimes ATE is also called the parent average
test effect. )
 The parameters of ATE can be written as

 ate =E[tei ]  E[yi (1)  yi (0)]  E[ yi (1)]  E[ yi (0)] 2.76

CH2 simple regression model page 65


 i _ independent of u i Assumptions of (0) and x i independent of yi (0)
is the same. This assumption can only be guaranteed to be true
under random assignment ( that is, each unit is randomly assigned
to the experimental group and the control group ) .
 Randomization is the cornerstone of randomized controlled trials
(RCTs) , which have long been regarded as the standard for whether
a medical intervention is effective.

CH2 simple regression model page 66

You might also like