Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Chapter 02

Ch.2 The simple regression model


1. Definition of the simple regression model
The Simple Regression Model 2. Deriving the OLS estimates
3. Mechanics of OLS
4. Units of measurement & functional form
y = 0 + 1x + u
5. Expected values & variances of OLSE
6. Regression through the origin

Econometrics 1 Econometrics 2

2.1 Definition of the model


Equation (2.1), y = 0 + 1x + u, defines the
Simple Regression model.
In the model, we typically refer to
 y as the Dependent Variable
 x as the Independent Variable
 s as parameters, and
 u as the error term.

Econometrics 3 Econometrics 4

The Concept of Error Term A Simple Assumption for u


u represents factors other than x that affect y. The average value of u, the error term, in
If the other factors in u are held fixed, so that the population is 0. That is, E(u) = 0.
u = 0, then y = 1x.  This is not a restrictive assumption, since we
Ex. 2.1: yield = 0 + 1fertilizer + u (2.3) can always use 0 to normalize E(u) to 0.
 u includes land quality, rainfall, etc. To draw ceteris paribus conclusions about
Ex. 2.2: wage = 0 + 1educ + u (2.4) how x affects y, we have to hold all other
factors (in u) fixed.
 u includes experience, ability, tenure, etc.

Econometrics 5 Econometrics 6

Simple Regression Model 1


Chapter 02

E(y|x) as a linear function of x, where for any x


Zero Conditional Mean the distribution of y is centered about E(y|x)
y
We need to make a crucial assumption
f(y)
about how u and x are related.
We want it to be the case that knowing
something about x does not give us any . E(y|x) =  +  x
0 1
information about u, so that they are
completely unrelated. That is, that .
 E(u|x) = E(u) = 0 (2.5&2.6), which implies
 E(y|x) = 0 + 1x (PRF) (2.8)
x1 x2 x
Econometrics 7 Econometrics 8

Population regression line, sample data points


2.2 Deriving the OLSE and the associated error terms
y E(y|x) = 0 + 1x
Basic idea of regression is to estimate the y4 .
u4 {
population parameters from a sample.
Let {(xi,yi): i = 1, …, n} denote a random y3 .} u3
u {.
sample of size n from the population. y2
2
For each observation in this sample, it will
be the case that
yi = 0 + 1xi + ui. (2.9) y1 .} u1
x1 x2 x3 x4 x
Econometrics 9 Econometrics 10

Deriving OLSE using MM Cont. Deriving OLSE using MM


To derive the OLS estimates, we need to Since u = y – 0 – 1x, we can rewrite;
realize that our main assumption of E(u) = E(y – 0 – 1x) = 0 (2.12)
 E(u|x) = E(u) = 0 also implies that E(xu) = E[x(y – 0 – 1x)] = 0 (2.13)
 Cov(x,u) = E(xu) = 0
These are called moment restrictions
 Because Cov(X,Y) = E(XY) – E(X)E(Y) (B.27)
 The approach to estimation implies imposing the
Now we prepare 2 restrictions to estimate s. population moment restrictions on the sample
 E(u) = 0 (2.10) moments. It means, a sample estimator of E(X),
 E(xu) = 0 (2.11) the mean of a population distribution, is simply
the arithmetic mean of the sample.
Econometrics 11 Econometrics 12

Simple Regression Model 2


Chapter 02

More Derivation of OLS Cont. More Derivation of OLS


Given the definition of a sample mean, and
We want to choose values of the parameters properties of summation, we can rewrite the first
that will ensure that the sample versions of condition as follows
our moment restrictions are true y  ˆ0  ˆ1 x (2.16) or ˆ0  y  ˆ1 x (2.17)
The sample versions are as follows:
So the OLS estimated slope is
 
n
n 1  yi  ˆ0  ˆ1 xi  0 (2.14)
n

i 1  x  x  y i i  y
ˆ1  i 1

 
n (2.19)
n 1  xi yi  ˆ0  ˆ1 xi  0 (2.15)
n

 xi  x 
2

i 1 i 1
Econometrics 13 Econometrics 14

Summary of OLS slope estimate More OLS


The slope estimate is the sample Intuitively, OLS is fitting a line through the
covariance between x and y divided by the sample points such that the sum of squared
sample variance of x. residuals is as small as possible, hence the
If x and y are positively (negatively) term is called least squares.
correlated, the slope will be positive The residual, û, is an estimate of the error
(negative). term, u, and is the difference between the
x needs to vary in our sample. fitted line (sample regression function) and
 See (2.18) & Figure (2.3) the sample point.
Econometrics 15 Econometrics 16

Sample regression line, sample data points


and the associated estimated error terms Alternate approach to derivation
y
y4 . Given the intuitive idea of fitting a line, we
û4 {
can set up a formal minimization problem.
 uˆ    y 
yˆ  ˆ 0  ˆ1 x n n 2
 ˆ0  ˆ1 xi
2
i i (2.22)
y3 .} û3 i 1 i 1
y2
û2 {
.
The first order conditions, which are the
almost same as (2.14) & (2.15),

 y   
n n
y1 .} û1 i  xi yi  ˆ0  ˆ1 xi  0
 ˆ0  ˆ1 xi  0,   
i 1 i 1
x1 x2 x3 x4 x
Econometrics 17 Econometrics 18

Simple Regression Model 3


Chapter 02

2.3 Properties of OLS Cont. Algebraic Properties


Algebraic Properties of OLS 2. The sample covariance between the
regressors and the OLS residuals is zero
1. The sum of the OLS residuals is zero. n
Thus, the sample average of the OLS  x uˆ
i 1
i i  0 ( 2.31)
residuals is zero as well.
n
1 n 3. The OLS regression line always goes
 uˆ
i 1
i  0 and thus,
n
 uˆ
i 1
i 0 (2.30) through the mean of the sample
y  ˆ0  ˆ1 x

Econometrics 19 Econometrics 20

Cont. Algebraic Properties Goodness-of-Fit


We can think of each observation as being made It’s useful we think about how well the
up of an explained part, and an unexplained part, sample regression line fits sample data.
yi  yˆ i  uˆi (2.32) Then we define the following : From (2.36),
  y  y   SST (2.33) SSE SSR
2
i R2   1    (2.38).
  yˆ  y   SSE (2.34)
i
2 SST SST
R2 indicates the fraction of the sample
 uˆ  SSR (2.35)
2
i
variation in yi that is explained by the
Then, SST  SSE  SSR (2.36) model.
Econometrics 21 Econometrics 22

2.4 Measurement Units & Function Form 2.5 Means & Variance of OLSE
If we use the model y* = 0* + 1* x* + u* Now, we view ̂ i as estimators for the parameters
instead of y = 0 + 1 x + u, we get i that appears in the population, which means
c properties of the distributions of ̂ i over different
ˆ0*  cˆ0 and ˆ1*  ˆ1 random samples from the population.
d
where y* = c y and x* = d x. Similarly, Unbiasedness of OLS
y x y Unbiased estimator: An estimator whose expected
ˆ1*    ˆ1  value (or mean of its sampling distribution) equals
x y x the population value (regardless of the population
where y* = ln y and x* = ln x. value).

Econometrics 23 Econometrics 24

Simple Regression Model 4


Chapter 02

Cont. Unbiasedness of OLS Cont. Unbiasedness of OLS


Assumption for unbiasedness In order to think about unbiasedness, we
1. Linear in parameters as y = 0 + 1x + u need to rewrite our estimator in terms of
2. Random sampling {(xi, yi): i = 1, 2, …, n}, the population parameter.
xi  x  yi  x  x u
ˆ1  
Thus, yi = 0 + 1xi + ui
 1  i i
(2.49), (2.52)
3. Sample variation in the xi, thus  (x  x)
i
2
 (x  x) i
2

 (x  x)
i
2
0
 
then E ˆ1  1 
 x  x   E (u
i
| x )  1  
(2.53)
 (x  x) 2 i
4. Zero conditional mean, E(u|x) = 0 i

* we can also get E ( ˆ0 )   0 in the same way.


Econometrics 25 Econometrics 26

Unbiasedness Summary Variances of the OLS Estimators


The OLS estimates of 1 and 0 are Now we know that the sampling
unbiased. distribution of our estimate is centered
Proof of unbiasedness depends on our 4 around the true parameter.
assumptions – if any assumption fails, then  We want to think about how spread out this
OLS is not necessarily unbiased. distribution is.
Remember unbiasedness is a description of  It is much easier to think about this variance
the estimator – in a given sample our under an additional assumption, so assume
estimate may be “near” or “far” from the
true parameter.
5. Var(u|x) = 2 (Homoskedasticity)

Econometrics 27 Econometrics 28

Cont. Variance of OLSE Homoskedastic Case


y
2 is also the unconditional variance, called f(y|x)
the error variance, since
Var(u|x) = E(u2|x) - [E(u|x)]2

 E(u|x) = 0, so  = E(u |x) = E(u ) = Var(u)


2 2 2
. E(y|x) =  +  x
0 1
 And , the square root of the error variance, is .
called the standard deviation of the error.
Then we can say
E(y|x)=0 + 1x and Var(y|x) = 2
x1 x2
Econometrics 29 Econometrics 30

Simple Regression Model 5


Chapter 02

Heteroskedastic Case Cont. Variance of OLSE


f(y|x)
 2
Var ( ˆ1 )  ( 2 . 57 )
 (x i  x )2

. The larger the error variance, 2, the larger


. E(y|x) = 0 + 1x the variance of the slope estimate.
. The larger the variability in the xi, the
smaller the variance of the slope estimate.
As a result, a larger sample size should
x1 x2 x3 x decrease the variance of the slope estimate.
Econometrics 31 Econometrics 32

Estimating the Error Variance Cont. Error Variance Estimate


We don’t know what is the error variance, uˆi  yi  ˆ0  ˆ1 xi
2, because we don’t observe the errors, ui.
  0  1 xi  ui   ˆ0  ˆ1 xi
What we observe are only the residuals, ûi, i 
 u  ˆ    ˆ   x
0 0   1 1  i
not the errors, ui.
Then, an unbiased estimator of  2 is
1
n  2  i
So we can use the residuals to form an
estimate of the error variance.
ˆ 2  uˆ 2 (2.61)

Econometrics 33 Econometrics 34

Cont. Error Variance Estimate 2.6 Regression through the Origin


Now, consider the model without a intercept:
ˆ  ˆ 2  Standard error of the regression ~ ~
y  1 x (2.63).

recall that s.d. ˆ  Var ( ˆ ) Solving the FOC to the minimization
If we substitute ˆ for  , then we have problem, OLS estimated slope is
the standard error of ˆ ,
1   i 2 i (2.66).
1
~ xy
 xi
  ˆ 2
se ˆ1 
 i  x )2
( x * Recall that a intercept can always normalize E(u)
to 0 in the model with 0.

Econometrics 35 Econometrics 36

Simple Regression Model 6

You might also like