Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
1 views

Lecture 2-2_Simple Linear Regression (One Regressor)

Uploaded by

Hannes Du
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

Lecture 2-2_Simple Linear Regression (One Regressor)

Uploaded by

Hannes Du
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

ECON 120B Econometrics

Lecture 2 Part 2: Simple Linear Regression (One Regressor)

Xinwei Ma

Department of Economics
UC San Diego

Spring 2021

c Xinwei Ma 2021 0/18


Outline

Definition of the Simple Linear Regression Model

Deriving the Ordinary Least Squares Estimates

c Xinwei Ma 2021 0/18


Definition of the Simple Linear Regression Model

 We begin with cross-sectional analysis and will assume we have a random sample from
the population of interest.

 We begin with the following premise, once we have a population in mind. There are two
variables, X and Y , and we would like to study “how Y varies with changes in X .”

 We have seen examples: X is class size, and Y is student test score.

 We must confront three issues:

1. How can we allow for factors other than X to affect Y ? There is never an exact relationship
between two variables (in interesting cases).

2. What is the functional relationship between Y and X ?

3. How can we be sure we are capturing a causal relationship between Y and X (as is so often the
goal)?

c Xinwei Ma 2021 1/18


Definition of the Simple Linear Regression Model

 Consider the following equation relating Y to X :

Y = β0 + β1 X + u

which is assumed to hold in the population of interest.

 This equation defines the simple linear regression model (a.k.a. two-variable regression
model, or regression with one regressor).

 The term “regression” has historical roots in the “regression to the mean” phenomenon.

 Y and X are not treated symmetrically. We want to explain Y in terms of X . From a


causality standpoint, it makes no sense to “explain” class size in terms of student
performance.

 As another example, we want to explain future income/earning (Y ) in terms of


educational attainment (X ), not the other way around.

c Xinwei Ma 2021 2/18


Definition of the Simple Linear Regression Model

 Terminology:
Y X
dependent variable independent variable
explained variable explanatory variable
response variable feature
predicted variable predictor
regressand regressor
left-hand side variable right-hand side variable
control variable
covariate

 Remark: “dependent” and “independent” are used quite often. They should not be
confused with the notion of statistical independence.

c Xinwei Ma 2021 3/18


Definition of the Simple Linear Regression Model

 We mentioned the error term, u, before. The equation

Y = β0 + β1 X + u

explicitly allows for other factors, contained in u, to affect Y .

 This equation also addresses the functional form issue (in a simple way).

• Y is assumed to be linearly related to X .

• β0 is the intercept parameter and β1 is the slope parameter. These describe a population, and
our ultimate goal is to estimate them.

 The equation also addresses the causality issue. In this model, all other factors that
affect Y are in u. We want to know how Y changes when X changes, holding u fixed.

 Let ∆ denote “change.” Then holding u fixed means ∆u = 0. So

∆Y = β1 ∆X + ∆u
= β1 ∆X (when ∆u = 0).

• This equation provides an interpretation of β1 as a slope, provided that we can “hold all other
factors fixed” (∆u = 0).
c Xinwei Ma 2021 4/18
Definition of the Simple Linear Regression Model

 EXAMPLE. Class size and test score

• Our model is
testScore = β0 + β1 classSize + u,
where u contains other factors such as textbook, teacher’s experience, school district, etc.

• If we can “hold u fixed,” then the effect of class size on test score is
∆testScore = β1 ∆classSize, when ∆u = 0.

• How should we interpret the three issues?

1. How can we allow for factors other than X to affect Y ?

2. What is the functional relationship between Y and X ?

3. How can we be sure we are capturing a causal relationship between Y and X ?

c Xinwei Ma 2021 5/18


Definition of the Simple Linear Regression Model

 We said we must confront three issues:

1. How can we allow for factors other than X to affect Y ?

2. What is the functional relationship between Y and X ?

3. How can we be sure we are capturing a causal relationship between Y and X ?

 We have argued that the simple regression model

Y = β0 + β1 X + u

addresses each of them.

 This seems too easy! How can we hope to generally estimate the causal effect of Y on X
when we have assumed all other factors affecting Y are unobserved and lumped into u?

 The key is that the simple linear regression model is a population model. When it comes
to estimating β1 (and β0 ) using data, we must restrict how u and X are related to each
other.

c Xinwei Ma 2021 6/18


Definition of the Simple Linear Regression Model
 But X and u are properly viewed as having distributions in the population.
• For example, if X = classSize, then, in principle, we could figure out its distribution in the
population. Suppose u is student ability. Assuming we can measure what that means, it also has
a distribution in the population.

 What we must do is to restrict the way u and X relate to each other in the population.

 Simplifying assumption: without loss of generality, the average, or the expected value of
u is zero in the population:
E[u] = 0.

 Normalizing “ability” to be zero in the population should be harmless.

 The presence of β0 in
Y = β0 + β1 X + u
allows us to assume E[u] = 0. If the average of u is different from zero, we just adjust
the intercept, leaving the slope the same. If α0 = E[u] then we can write
Y = β0 + α0 + β1 X + u − α0 ,
| {z } | {z }
new intercept new error term

where the new error term, u − α0 , has a zero mean.

 The new intercept is β0 + α0 . Key point: the slope, β1 , has not changed.
c Xinwei Ma 2021 7/18
Definition of the Simple Linear Regression Model

 EXAMPLE. Suppose u is “teacher experience” and X is “class size.” We need, for


example,
E[teacherExp|classSize = 20] = E[teacherExp|classSize = 40]
so that the average teacher experience is the same across different class sizes.

• What if better school districts attracts more experienced teachers and have smaller classes?

 EXAMPLE. Suppose u is “student ability” and X is “class size.” We need, for example,
E[studentAbi|classSize = 20] = E[studentAbi|classSize = 40]
so that the average student ability is the same across different class sizes.

• What if students attending large classes are better prepared as they anticipate less in-class
interaction?

 Because the expectation is a linear operation, E[u|X ] = 0 implies


E[Y |X ] = β0 + β1 X + E[u|X ] = β0 + β1 X ,
which shows the population regression function is a linear function of X .

 We call E[u|X ] = 0 the zero conditional mean assumption. Or alternatively, we call


E[Y |X ] = β0 + β1 X the linear conditional mean assumption.
c Xinwei Ma 2021 8/18
Outline

Definition of the Simple Linear Regression Model

Deriving the Ordinary Least Squares Estimates

c Xinwei Ma 2021 8/18


Deriving the Ordinary Least Squares Estimates

 Given data on X and Y , how can we estimate the population parameters, β0 and β1 ?

 Let (Xi , Yi ) : i = 1, 2, · · · , n be a sample of size n (the number of observations) from the


population. Think of this as a random sample (meaning that the observations are
identically and independently distributed).

 Plug any observation into the population equation:

Yi = β0 + β1 Xi + ui , i = 1, 2, · · · , n,

where the i subscript indicates a particular observation.

 We observe Yi and Xi , but not the error term ui . (However, we know ui is there.)

c Xinwei Ma 2021 9/18


Deriving the Ordinary Least Squares Estimates

 We use the two restrictions


E[u] = 0, E[uX ] = 0
to motivate how β0 and β1 can be estimated.

• Both conditions are implied by the zero conditional mean assumption: E[u|X ] = 0.

• E[u] = E[E[u|X ]] = E[0] = 0, because E[u|X ] = 0.

• E[uX ] = E[E[uX |X ]] = E[E[u|X ] · X ] = E[0 · X ] = 0, because again E[u|X ] = 0.

 Next, we plug u into the two equations:

u = Y − β0 − β1 X ⇒ 0 = E[Y − β0 − β1 X ]

0 = E[X (Y − β0 − β1 X )]

 These are the two conditions in the population that determine β0 and β1 . So we use
their sample analogs, which is a method of moments approach to estimation.

c Xinwei Ma 2021 10/18


Deriving the Ordinary Least Squares Estimates

 Recall that from the zero conditional mean assumption, E[u|X ] = 0, we obtained the two
population conditions

0 = E[Y − β0 − β1 X ], 0 = E[X (Y − β0 − β1 X )].

 By the law of large numbers, we expect the sample analogues to be close to zero
n n
1X  1X 
0≈ Yi − β0 − β1 Xi , 0≈ Xi (Yi − β0 − β1 Xi ) .
n i=1 n i=1

They will not be exactly zero due to sample variation. Remember: we are dealing with a
(random) sample, not the population.

 The two estimates, β̂0 and β̂1 , are defined by requiring the sample analogues to be
exactly zero:
n n
1X  1X 
0= Yi − β̂0 − β̂1 Xi , 0= Xi (Yi − β̂0 − β̂1 Xi ) .
n i=1 n i=1

 Later, we will show that the two estimates, β̂0 and β̂1 , are close to their true population
values, β0 and β1 .

c Xinwei Ma 2021 11/18


Deriving the Ordinary Least Squares Estimates
 From the first equation, we have
n
1X 
0= Yi − β̂0 − β̂1 Xi
n i=1
n n n
! ! !
1X 1X 1X
= Yi − β̂0 − β̂1 Xi take average term by term
n i=1 n i=1 n i=1
n n
! !
1X 1X
= Yi − β̂0 − β̂1 Xi β̂0 and β̂1 do not depend on i
n i=1 n i=1

= Y − β̂0 − β̂1 X .

 From the second equation, we have


n
1X 
0= Xi Yi − β̂0 − β̂1 Xi
n i=1
n n n
! ! !
1X 1X 1X
= Xi Yi − β̂0 Xi − β̂1 Xi2 take average term by term
n i=1 n i=1 n i=1
n n n
! ! !
1 X 1 X 1X 2
= Xi Yi − β̂0 Xi − β̂1 Xi β̂0 and β̂1 do not depend on i
n i=1 n i=1 n i=1

= XY − β̂0 X − β̂1 X 2 .
c Xinwei Ma 2021 12/18
Deriving the Ordinary Least Squares Estimates

 We have two equations and two unknowns:


Equation 1 : β̂0 + β̂1 X = Y ; Equation 2 : β̂0 X + β̂1 X 2 = XY .
 Multiply the first equation by X and take the difference:

XY −X Y = β̂0 X + β̂1 X 2 −X β̂0 + β̂1 X
|{z} |{z} | {z } | {z }
Equation 2 Equation 1 Equation 2 Equation 1

= β̂1 X 2 − X · X .
 We further simplify both sides:
n n n
! ! !
1X 1X 1X
XY − X · Y = Xi Yi − Xi Yi
n i=1 n i=1 n i=1
n
1X  
= Xi − X Yi − Y = sample covaraince between Xi and Yi ,
n i=1
and
n
! n
!2
1X 2 1X
X2 − X · X = X − Xi
n i=1 i n i=1
n
1X 2
= Xi − X = sample variance of Xi .
n i=1
 Verify the above!
c Xinwei Ma 2021 13/18
Deriving the Ordinary Least Squares Estimates

 Do not lose the big picture. We started from the zero conditional mean assumption
E[u|X ] = 0,
and obtained the two population conditions
0 = E[Y − β0 − β1 X ], 0 = E[X (Y − β0 − β1 X )].

 In order to estimate β0 and β1 , we rely on a sample, (Xi , Yi ) : i = 1, 2, · · · , n, and replace


the population conditions by their sample analogues
n n n n n
! ! ! ! !
1X 1X 1X 1X 1X 2
0= Yi −β̂0 −β̂1 Xi , 0 = Xi Yi −β̂0 Xi −β̂1 X .
n i=1 n i=1 n i=1 n i=1 n i=1 i

 Finally, the slope estimate β̂1 is solved as


n
 n  n 
1 P 1 P 1 P
Xi Yi − Xi Yi
n i=1 n i=1 n i=1
β̂1 = 2
n
 n
1 1
Xi2 −
P P
Xi
n i=1 n i=1
1 Pn
(Xi − X )(Yi − Y )
n i=1 sample covaraince between Xi and Yi
= n = .
1 P 2 sample variance of Xi
Xi − X
n i=1
c Xinwei Ma 2021 14/18
Deriving the Ordinary Least Squares Estimates

 The previous formula for β̂1 is important. It shows us how to take the data we have and
compute the slope estimate. For reasons we will see, β̂1 is called the ordinary least
squares (OLS) slope estimate. We often refer to it as the slope estimate.

 The slope estimate can be computed whenever the sample variance of Xi is not zero,
which only rules out the case where each Xi takes the same value in the sample.

• This makes sense, because we cannot hope to learn “how Y responses to changes in X ” if we
never observe any change in X .

 Once we have β̂1 , we compute β̂0 = Y − β̂1 X . This is the OLS intercept estimate.

 The calculation is tedious even for small n (sample size). These days, one lets a
computer do the calculations.

c Xinwei Ma 2021 15/18


Deriving the Ordinary Least Squares Estimates

 Once we have the numbers β̂0 and β̂1 for a given data set, we write the regression line as
a function of X :
Ŷ = β̂0 + β̂1 X .

 The regression line allows us to predict Y for any (sensible) value of X . It is also called
the sample regression function.

 The intercept, β̂0 , is the predicted Y when X = 0. (The prediction is usually meaningless
if X = 0 is not possible. Consider the class size example.)

 The slope, β̂1 , allows us to predict changes in Y for any (reasonable) change in X :

∆Ŷ = β̂1 ∆X .

 If ∆X = 1, so that X increases by one unit, then ∆Ŷ = β̂1 .

c Xinwei Ma 2021 16/18


Deriving the Ordinary Least Squares Estimates

 EXAMPLE. Class size and test score


• Test scores and class sizes in 1990 in 420 California school districts that serve kindergarten
through eighth grade.
• Test score: districtwise average of reading and math scores for fifth graders.

• Class size: districtwise student-to-teacher ratio.

 The estimated equation is

\
testScore = 698.9 − 2.28 stuTeacherRatio

720
700
 The intercept is meaningless. Literally, it says

680
that the average score is predicted to be 698.9
in a district with no student.

660
640
 The slope suggests a 2.28 increase in test 620

score with each unit decrease in


600

student-to-teacher ratio. 10 15 20 25 30
Student-Teacher Ratio

Average Test Score Fitted values

c Xinwei Ma 2021 17/18


Deriving the Ordinary Least Squares Estimates

 EXAMPLE. Education and wage


• Information on wage and education of 526 individuals in the workforce in 1976.
• Wage: hourly wage measured in dollars. (We need to adjust for inflation by a factor of 4.22, to
express in 2016 dollars.)
• Education: measured in years.

 The estimated equation is


d = −0.90 + 0.54 educ
wage

30
25
 The intercept is meaningless. Literally, it says

20
that the average wage is predicted to be
−$0.90 for individuals who never attended

15
school.

10
5
 The slope suggests a $0.54 increase in wage
0

with an additional year of education. 0 5 10 15 20


years of education

average hourly earnings Fitted values

c Xinwei Ma 2021 18/18


The lectures and course materials, including slides, tests, outlines, and similar
materials, are protected by U.S. copyright law and by University policy. You may take
notes and make copies of course materials for your own use. You may also share those
materials with another student who is enrolled in or auditing this course.

You may not reproduce, distribute or display (post/upload) lecture notes or


recordings or course materials in any other way – whether or not a fee is charged –
without my written consent. You also may not allow others to do so.

If you do so, you may be subject to student conduct proceedings under the UC San
Diego Student Code of Conduct.

c Xinwei Ma 2021
x1ma@ucsd.edu

c Xinwei Ma 2021 18/18

You might also like