Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
12 views

Lecture4 Linearregression Oneregressor

This document summarizes key points from a lecture on linear regression with one regressor: 1) It presents the OLS estimators for the slope (β1) and intercept (β0) coefficients and applies them to estimate the effect of class size on test scores using student data. 2) It reviews the three assumptions of the ordinary least squares method: that the error term is uncorrelated with the regressor (E(ui|Xi)=0), observations are independent and identically distributed, and there are no large outliers. 3) It describes how the OLS estimators are unbiased under the assumptions and have consistent, normally distributed sampling properties in large samples.

Uploaded by

bonalege123
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Lecture4 Linearregression Oneregressor

This document summarizes key points from a lecture on linear regression with one regressor: 1) It presents the OLS estimators for the slope (β1) and intercept (β0) coefficients and applies them to estimate the effect of class size on test scores using student data. 2) It reviews the three assumptions of the ordinary least squares method: that the error term is uncorrelated with the regressor (E(ui|Xi)=0), observations are independent and identically distributed, and there are no large outliers. 3) It describes how the OLS estimators are unbiased under the assumptions and have consistent, normally distributed sampling properties in large samples.

Uploaded by

bonalege123
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

ECON4150 - Introductory Econometrics

Lecture 4: Linear Regression with One


Regressor

Monique de Haan
(moniqued@econ.uio.no)

Stock and Watson Chapter 4


2

Lecture outline

• The OLS estimators

• The effect of class size on test scores

• The Least Squares Assumptions

• E (ui |Xi ) = 0

• (Xi , Yi ) are i.i.d

• Large outliers are unlikely

• Properties of the OLS estimators

• unbiasedness

• consistency

• large sample distribution

• The compulsory term paper


3

The OLS estimators

Question of interest: What is the effect of a change in Xi on Yi ?

Yi = β0 + β1 Xi + ui

Last week we derived the OLS estimators of β0 and β1 :

c0 = Y − βb1 X
β

1
Pn
n−1 i=1 (Xi −X )(Yi −Y ) sxy
β
c1 = 1
Pn = sx2
,
n−1 i=1 (Xi −X )(Xi −X )
4

OLS estimates: The effect of class size on test scores

Friday January 13 14:48:31 2017 Page 1


Question of interest: What is the effect of a change in class size on test
___ ____ ____ ____ ____(R)
scores? /__ / ____/ / ____/
___/ / /___/ / /___/
TestScorei = β0 + β1 ClassSize i + u i
Statistics/Data Analysis

1 . regress test_score class_size, robust

Linear regression Number of obs = 420


F(1, 418) = 19.26
Prob > F = 0.0000
R-squared = 0.0512
Root MSE = 18.581

Robust
test_score Coef. Std. Err. t P>|t| [95% Conf. Interval]

class_size -2.279808 .5194892 -4.39 0.000 -3.300945 -1.258671


_cons 698.933 10.36436 67.44 0.000 678.5602 719.3057

\ i = 698.93 − 2.28 · ClassSizei


TestScore
5

The Least Squares assumptions

Yi = β0 + β1 Xi + ui
Under what assumptions does the method of ordinary least squares provide
appropriate estimators of β0 and β0 ?

Under what assumptions does the method of ordinary least squares provide
an appropriate estimator of the effect of class size on test scores?

The Least Squares assumptions:


Assumption 1: The conditional mean of ui given Xi is zero

E (ui |Xi ) = 0

Assumption 2: (Yi , Xi ) for i = 1, ..., n are independently and


identically distributed (i.i.d)
Assumption 3: Large outliers are unlikely
   
0 < E Xi4 < ∞ & 0 < E Yi4 < ∞
6

The Least Squares assumptions: Assumption 1

E (ui |Xi ) = 0

The first OLS assumption states that:

All other factors that affect the dependent variable Yi (contained in ui ) are
unrelated to Xi in the sense that, given a value of Xi , the mean of these other
factors equals zero.

In the class size example:

All the other factors affecting test scores should be unrelated to class size in
the sense that, given a value of class size, the mean of these other factors
equals zero.
7

The Least Squares assumptions: Assumption 1

The first OLS assumption can also be written as:

E (Yi |Xi ) = E (β0 + β1 Xi + ui |Xi )

Expectation rules

= β0 + β1 E (Xi |Xi ) + E (ui |Xi )

ASS#1 : E (ui |Xi ) = 0

= β0 + β1 Xi
8

The Least Squares assumptions: Assumption 1

E (Yi |Xi ) = β0 + β1
9

The Least Squares assumptions: Assumption 1


Example of a violation of assumption 1:

Suppose that

• districts which wealthy inhabitants have small classes and good


teachers
• these districts have a lot of money which they can use to hire more
and better teachers

• districts with poor inhabitants have large classes and bad teachers.
• These districts have little money and can hire only few and not very
good teachers

In this case class size is related to teacher quality.

Since teacher quality likely affects test scores it is contained in ui .

This implies a violation of assumption 1:

E (ui |ClassSizei = small) 6= E (ui |ClassSizei = large) 6= 0


10

The Least Squares assumptions: Assumption 2

(Yi , Xi ) for i = 1, ..., n are i.i.d

• If the sample is drawn by simple random sampling assumption 2 will hold

Example: What is effect of mother’s education (Xi ) on child’s education (Yi )

Example of simple random sampling:

• randomly draw sample of mother’s with information on her education


and the education of one randomly selected child
• (Yi , Xi ) for i = 1, ..., n are i.i.d

Example of a violation of simple random sampling

• randomly draw sample of mothers with information on her education and


the education of all of her children.
• (Yi , Xi ) for i = 1, ..., n are NOT i.i.d
• Observations on children from the same mother are not independent!
11

The Least Squares assumptions: Assumption 3

Large outliers are unlikely


   
0 < E Xi4 < ∞ & 0 < E Yi4 < ∞

• Outliers are observations that have values far outside the usual range of
the data

• Large outliers can make OLS regression results misleading

• Another way to state assumption is that X and Y have finite kurtosis.

• Assumption is necessary to justify the large sample approximation to the


sampling distribution of the OLS estimators
12

The Least Squares assumptions: Assumption 3


13

Use of the Least Squares assumptions

Yi = β0 + β1 Xi + ui

Assumption 1: E (ui |Xi ) = 0

Assumption 2: (Yi , Xi ) for i = 1, ..., n are i.i.d

Assumption 3: Large outliers are unlikely

If the 3 least squares assumptions hold the OLS estimators βb0 and βb1

• Are unbiased estimators of β0 and β1

• Are consistent estimators of β0 and β1

• Have a jointly normal sampling distribution


14

Properties of the OLS estimator: unbiasedness

Yi = β0 + β1 Xi + ui Y = β0 + β1 Xi + u

 Pn 
(Xi −X )(Yi −Y )
h i
i=1
E βb1 =E Pn
i=1 (Xi −X )(Xi −X )

substitute for Yi , Y
 Pn 
i=1 (Xi −X )(β0 +β1 Xi +ui −(β0 +β1 X +u))
=E Pn
i=1 (Xi −X )(Xi −X )

rewrite (β0 drops out)


 Pn 
i=1 (Xi −X )(β1 (Xi −X )+(ui −u))
=E Pn
i=1 (Xi −X )(Xi −X )

rewrite & use expectation rules


 Pn   Pn 
β1 i=1 (Xi −X )(Xi −X ) i=1 (Xi −X )(ui −u)
= E Pn X −X X −X + E Pn X −X X −X
i=1 ( i )( i ) i=1 ( i )( i )
15

Properties of the OLS estimator: unbiasedness

.    
β1 ni=1 (Xi −X )(Xi −X )
Pn
(Xi −X )(ui −u)
h i P
i=1
E βb1 =E Pn +E Pn
i=1 ( i
X −X )(Xi −X ) i=1 ( i
X −X )(Xi −X )

take β1 out of 1st expectation


Algebra trick
 Pn 
(Xi −X )ui
= β1 + E Pn i=1
i=1 ( i
X −X )(Xi −X )

Law of iterated expectations


 Pn 
(X −X )E[ui |Xi ]
= β1 + E Pni=1 X i−X X −X
i=1 ( i )( i )

h i
E βb1 = β1 if E [ui |Xi ] = 0
16

Algebra trick

Pn   Pn Pn Pn Pn
i=1 Xi − X (ui − u) = i=1 Xi ui − i=1 Xi u − i=1 X ui + i=1 Xu
Pn 1
Pn  Pn
= i=1 Xi ui − n · n i=1 Xi u − i=1 X ui + nX u
Pn
Xi ui − nX u + ni=1 X ui +nX u
P
= i=1

= ni=1 Xi ui − ni=1 X ui
P P

P  
= ni=1 Xi − X ui
17

Consistency
p
b1 −→ β1
Consistency:β or plim βb1 = β1
 Pn 
i=1 (Xi −X )(Yi −Y )
Plim βb1 = plim Pn
i=1 (Xi −X )(Xi −X )
1 Pn
Plim n−1 i=1 (Xi −X )(Yi −Y ) sXY
= 1 Pn = 2
i=1 ( i
Plim n−1 X −X )(Xi −X ) sX

law of large numbers


OLS assumptions 2 and 3
Cov (Xi ,Yi )
= Var (Xi )

substitute for Yi
Cov (Xi ,β0 +β1 Xi +ui )
= Var (Xi )

see Key Concept 2.3


β1 Var (Xi )+Cov (Xi ,ui )
= Var (Xi )
18

Consistency

β1 Var (Xi )+Cov (Xi ,ui )


Plim βb1 = Var (Xi )

= β1 Var (Xi )
Var (X )
+ Cov (Xi ,ui )
Var (Xi )
i

substitute covariance expression


E[(Xi −µx )(ui −µu )]
= β1 + Var (Xi )

algebra trick
E[(Xi −µx )ui ]
= β1 + Var (Xi )

Law of iterated expectations


E[(Xi −µx )E[ui |Xi ]]
= β1 + Var (Xi )
so
Plim βb1 = β1 if E [ui |Xi ] = 0
19

Unbiasedness vs Consistency

• Unbiasedness & consistency both rely on E [ui |Xi ] = 0


h i
• Unbiasedness implies that E βb1 = β1 for a given sample size n

• Consistency implies that the sampling distribution becomes more and


more tightly distributed around β1 if the sample size n becomes larger
and larger.
20

Consistency: A simulation example

• Lets create a data set with 100 observations

• Xi ∼ N(0, 1)

• ui ∼ N(0, 1)

• We define Y to depend on X as: Yi = 1 + 2Xi + ui


Thursday January 19 12:00:40 2017 Page 1

set obs 1000 ___ ____ ____ ____ ____(R)


gen x=invnorm(uniform()) /__ / ____/ / ____/
___/ / /___/ / /___/
gen y=1+2*x+invnorm(uniform())
Statistics/Data Analysis

1 . sum y x

Variable Obs Mean Std. Dev. Min Max

y 100 .6123606 2.211365 -5.05828 5.462746


x 100 -.1479108 .9928607 -2.633841 1.80305
21

A simulation example

Y
0

Thursday January 19 12:01:25 2017 Page 1


-5
-3 -2 -1 0 ___ ____1 ____2 ____ ____(R)
X /__ / ____/ / ____/
___/ / /___/ / /___/
Statistics/Data Analysis

1 . regress y x

Source SS df MS Number of obs = 100


F( 1, 98) = 385.45
Model 385.987671 1 385.987671 Prob > F = 0.0000
Residual 98.1357149 98 1.00138485 R-squared = 0.7973
Adj R-squared = 0.7952
Total 484.123386 99 4.89013521 Root MSE = 1.0007

y Coef. Std. Err. t P>|t| [95% Conf. Interval]

x 1.988753 .1012965 19.63 0.000 1.787733 2.189772


_cons .9065187 .1011847 8.96 0.000 .705721 1.107316
22

A simulation example n=100

Monday February 16 16:57:38 2015 Page 1


We can create 999 of these data sets with 100 observations and use OLS to
estimate ___ ____ ____ ____ ____(R)
/__ / ____/ / ____/
Yi = β0 + β1 + ui ___/ / /___/ / /___/
Statistics/Data Analysis

1 . program define ols, rclass


1 . drop _all
2 . set obs 100
3 . gen x=invnorm(uniform())
4 . gen y=1+2*x+invnorm(uniform())
5 . regress y x
6 . end

2 .
3 . simulate _b, reps(999) nodots : ols

command: ols

4 . sum

Variable Obs Mean Std. Dev. Min Max

_b_x 999 1.997521 .1018595 1.67569 2.308795


_b_cons 999 1.003246 .1019056 .6844429 1.285363
23

A simulation example n=100

OLS estimates of B1 in 999 samples with n=100

0
1.6 1.8 2 2.2 2.4
OLS estimates of B1
24

A simulation example
Tuesday February n=1000
17 13:03:15 2015 Page 1

___ ____ ____ ____ ____(R)


/__ / ____/ / ____/
___/ / /___/ / /___/
Statistics/Data Analysis

1 . program define ols, rclass


1 . drop _all
2 . set obs 1000
3 . gen x=invnorm(uniform())
4 . gen y=1+2*x+invnorm(uniform())
5 . regress y x
6 . end

2 .
3 . simulate _b, reps(999) nodots : ols

command: ols

4 . sum

Variable Obs Mean Std. Dev. Min Max

_b_x 999 2.000035 .030417 1.908725 2.112585


_b_cons 999 1.000791 .0311526 .8970624 1.088724
25

A simulation example n=1000

OLS estimates of B1 in 999 samples with n=1000


15

10

0
1.6 1.8 2 2.2 2.4
OLS estimates of B1
26

A simulation example
Friday January n=10000
20 12:01:22 2017 Page 1

___ ____ ____ ____ ____(R)


/__ / ____/ / ____/
___/ / /___/ / /___/
Statistics/Data Analysis

1 . program define ols, rclass


1 . drop _all
2 . set obs 10000
3 . gen x=invnorm(uniform())
4 . gen y=1+2*x+invnorm(uniform())
5 . regress y x
6 . end

2 .
3 . simulate _b, reps(999) nodots: ols

command: ols

4 . sum

Variable Obs Mean Std. Dev. Min Max

_b_x 999 1.999748 .0099715 1.969678 2.034566


_b_cons 999 1.000391 .0100135 .9699681 1.033458
27

A simulation example n=10000

OLS estimates of B1 in 999 samples with n=10000

40

30

20

10

0
1.6 1.8 2 2.2 2.4
OLS estimates of B1
28

Consistency of the OLS estimator of βb1

True model : Yi = 1 + 2Xi + ui , Estimated model : Yi = β0 + β1 Xi + ui

OLS estimates of B1 in 999 samples


with n=100; n=1000 and n=10000
40 n=100
n=1000
n=10000
30

20
.

10

0
1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 2.4
OLS estimates of B1
29

Sampling distribution of βb0 and βb1

We discussed the sampling distribution of the sample average Y :

• sampling distribution is complicated for small n, but if Y1 , ..., Yn are i.i.d.


we know that  
E Y = µY

• By the Central Limit theorem the large sample distribution can be


approximated by the normal distribution:

σ2
 
Y ∼ N µY , Y
n

If the 3 least squares assumptions hold we can make similar statements


about the OLS estimators βb0 and βb1
30

Large-sample distribution of βb0 and βb1

• Technically the Central Limit theorem concerns the large sample


distribution of averages (like Y )

• Examining the formulas of the OLS estimators shows that these are
functions of sample averages:

βb0 = Y − βb1 X
1 Pn
n i=1 (Xi −X )(Yi −Y )
βb1 = 1 Pn
n i=1 (Xi −X )(Xi −X )

• It turns out that the Central Limit theorem also applies to these functions
of sample averages.
31

Sampling distribution of βb0 and βb1

If the first least squares assumption holds:

• The OLS estimators are unbiased which implies that (for any sample
size n)    
E βb0 = β0 and E βb1 = β1

In addition, if all 3 least squares assumptions hold

• The Central Limit theorem implies that βb0 and βb1 are approximately
jointly normally distributed in large samples:
 
βb0 ∼ N β0 , σβ2b
0
 
βb1 ∼ N β1 , σβ2b
1
32

Large-sample distribution of βb0 and βb1

In large samples  
βb0 ∼ N β0 , σβ2b
0
 
βb1 ∼ N β1 , σβ2b
1

where it can be shown that


 
1 Var (Hi ui ) µX
σβ2b = n E H2 2
with Hi = 1 − E (Xi2 )
Xi
0 [ ( i )]
1 Var [(Xi −µX )ui ]
σβ2b = n [Var (Xi )]2
1

Expression for σβ2b shows that the larger the variation in the regressor Xi the
1

smaller the variance of βb1


33

Large-sample distribution of βb0 and βb1


• When Var (Xi ) is low, it is difficult to obtain
 an accurate estimate of the
effect of X on Y which implies that Var β1 = σ 2b is high.
b
β1

• If there is more variation in X, then there is more information in the data


that you can use to fit the regression line.
34

Compulsory term paper

• Traffic fatalities are the leading cause of death for Americans between
the ages of 5 and 32.

• The government wants to decrease the number of traffic fatalities by


increasing seat belt usage.

• If many people wear seat belts the chance that people die in a car crash
is likely smaller.

• People who wear seat belts might however be more careful drivers.

• Regions with many seat belt users might have fewer traffic fatalities not
because of the seat belt usage but because the drivers are more careful.
35

Compulsory term paper

• In the term paper you are going to investigate the following research
question.

What is the causal effect of seat belt usage on traffic fatalities?

• This research question can be addressed by using the data set


seatbelts.dta.

• Data consists of a panel of 50 U.S. States, plus the District of Columbia,


for the years 1983-1997.

• The data sets can be downloaded from the course website site.

• In analyzing this data you may consider the use of panel data methods
on top of a pure cross-section analysis.
36

Compulsory term paper

The term paper should consist of the following sections:

• Introduction
• Empirical approach
• Data
• Results
• Conclusion
• References
• Appendix with Stata code & output

The term paper should be at most 10 pages including tables and figures (but
excluding the stata code and output).

The quality (and not the quantity) of the content of the term paper will
determine your grade.
37

Compulsory term paper

You are expected to work in a group of two students.

• You can form a group of two students yourself

• Register this group before 29 January 2017 00:00, by using link in email
you will receive today.

• If you are unable to form a group, please let me know before 29 January
2017.
• you will be randomly assigned to another student.

Important dates:

• 25 January 2017– Hand-out of term paper


• 22 March 2017 – Hand-in of term paper on Fronter
• 11 April 2017 – Notification of grade (pass/fail)
• 21 April 2017 – Hand-in of improved term paper for those who failed
• 4 May 2017– Everyone is informed about final grade for term paper

You might also like