Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
18 views

Lecture 6. Linear Regression

Uploaded by

Andrea
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Lecture 6. Linear Regression

Uploaded by

Andrea
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Outline

1 The simple regression model


ECMT5001: Principles of Econometrics
Lecture 6: Linear regression 2 Ordinary least squares (OLS)

3 Algebraic properties of OLS


Instructor: Simon Kwok1
1 School of Economics 4 Functional form
The University of Sydney

5 Assumptions of OLS

6 Statistical properties of OLS

1
Based on lecture notes by Nicolas de Roos.
Simon Kwok ECMT5001 L6 1 / 46 Simon Kwok ECMT5001 L6 2 / 46

Econometric modeling Outline


Econometric models can be used for
estimating and testing economic relationships 1 The simple regression model
evaluating policies from government or business
forecasting economic variables 2 Ordinary least squares (OLS)

Types of data 3 Algebraic properties of OLS


cross sections
I each observation is an individual at a point in time 4 Functional form
time series
I separate observations for each time period
5 Assumptions of OLS
pooled cross sections
I cross section data over multiple time periods
6 Statistical properties of OLS
panel data
I the same random sample is followed over multiple periods

Simon Kwok ECMT5001 L6 3 / 46 Simon Kwok ECMT5001 L6 The simple regression model 4 / 46
The simple regression model The simple regression model
Explains the variable y in terms of the variable x

random y= 0 + 1x +u
y intercept slope
error
Interpretation: studies how y varies with x
y= 0 + 1x +u

dependent independent @y @u
= 1 as long as =0
variable variable @x @x

@y
Terminology @x : by how much does the dependent variable change if the
independent variable increases by 1 unit?
y : dependent variable, explained variable, response variable
the interpretation is correct only if all other things remain equal when
x: independent variable, explanatory variable, regressor
the independent variable is changed
u: error term, disturbance, unobservable, residual

Simon Kwok ECMT5001 L6 The simple regression model 5 / 46 Simon Kwok ECMT5001 L6 The simple regression model 6 / 46

Simple regression examples Assumptions (introduction)


The population average of the error term is zero
Soybean yield and fertilizer
normalise the unobserved factors in the population to zero
yield = 0 + 1 fertilizer +u
E (u) = 0
1 measures the e↵ect of fertilizer on yield, holding all other factors Conditional mean independence
fixed
the explanatory variable must not contain information about the
u: rainfall, land quality, presence of parasites mean of the unobserved factors
A (simple) wage equation
E (u|x) = 0
wage = 0 + 1 educ +u
Example: wage equation
1 measures the change in hourly wage for another year of education,
the conditional mean independence assumption is unlikely to hold:
holding all other factors fixed people with more education will also be more intelligent on average
u: labour force experience, job tenure, work ethic, intelligence wage = +
0 1 educ+u e.g. intelligence

Simon Kwok ECMT5001 L6 The simple regression model 7 / 46 Simon Kwok ECMT5001 L6 The simple regression model 8 / 46
The population regression function The population regression function

The conditional mean independence assumption implies

E (y |x) = E ( 0 + 1x + u|x)
= 0 + 1x + E (u|x)
= 0 + 1x

i.e. the average value of the dependent variable can be expressed as a


linear function of the explanatory variable

Simon Kwok ECMT5001 L6 The simple regression model 9 / 46 Simon Kwok ECMT5001 L6 The simple regression model 10 / 46

Outline Regression data


To estimate the regression model we need data
1 The simple regression model a random sample of n observations

{(xi , yi ) : i = 1, . . . , n}
2 Ordinary least squares (OLS)
xi : value of the explanatory variable of the ith observation
3 Algebraic properties of OLS
yi : value of the dependent variable of the ith observation

4 Functional form (x1 , y1 ) first observation


(x2 , y2 ) second observation
5 Assumptions of OLS
(x3 , y3 ) third observation
..
6 Statistical properties of OLS .
(xn , yn ) nth observation

Simon Kwok ECMT5001 L6 Ordinary least squares (OLS) 11 / 46 Simon Kwok ECMT5001 L6 Ordinary least squares (OLS) 12 / 46
The regression objective Ordinary least squares
Fit a regression line as well as possible through the data:

ŷ = ˆ0 + ˆ1 x Regression residuals

ûi = yi ŷi = yi ˆ0 ˆ1 xi

Minimise the sum of squared residuals


n
X
min ûi2 ! ˆ0 , ˆ1
i=1

Ordinary least squares (OLS) estimates


Pn
(x x)(yi y )
ˆ1 = i=1 Pn i , ˆ0 = y ˆ1 x
i=1 (xi x)2

Simon Kwok ECMT5001 L6 Ordinary least squares (OLS) 13 / 46 Simon Kwok ECMT5001 L6 Ordinary least squares (OLS) 14 / 46

Ordinary least squares

Example (CEO salaries)


CEO salary and return on equity

salary = 0 + 1 roe +u

salary is in thousands of dollars


roe is the return on equity of the CEO’s firm
Fitted regression

\ = 963.191 + 18.501roe
salary

if the return on equity increases by 1 percentage point then salary is


predicted to increase by $18,501
is there a causal interpretation?

Simon Kwok ECMT5001 L6 Ordinary least squares (OLS) 15 / 46


OLS regression line Ordinary least squares
The fitted regression line depends on the sample and will di↵er from the
(unknown) population regression line Example (Wage equation)
Wages and education

wage = 0 + 1 educ +u

wage: hourly wage in dollars


educ: years of education
Fitted regression

[ =
wage 0.90 + 0.54educ

one more year of education is associated with an increase in the


hourly wage of $0.54
is there a causal interpretation?

Simon Kwok ECMT5001 L6 Ordinary least squares (OLS) 16 / 46 Simon Kwok ECMT5001 L6 Ordinary least squares (OLS) 17 / 46

Ordinary least squares Outline

Example (Voting outcomes)


1 The simple regression model
Voting outcomes and campaign expenditures (two parties)

voteA = 0 + 1 shareA +u 2 Ordinary least squares (OLS)

voteA: percentage of vote for candidate A 3 Algebraic properties of OLS


shareA: percentage of campaign expenditure by candidate A
Fitted regression 4 Functional form
\ = 26.81 + 0.464shareA
voteA
5 Assumptions of OLS
if candidate A’s share of spending increases by one percentage point,
he or she receives 0.464 percentage points more of the total vote 6 Statistical properties of OLS
is there a causal interpretation?

Simon Kwok ECMT5001 L6 Ordinary least squares (OLS) 18 / 46 Simon Kwok ECMT5001 L6 Algebraic properties of OLS 19 / 46
Algebraic properties of OLS OLS example: CEO salaries
Fitted values and residuals

ŷi = ˆ0 + ˆ1 xi , ûi = yi ŷi

fitted or predicted values residuals

Algebraic properties of OLS regression


n
X n
X
ûi = 0, xi ûi = 0, y = ˆ0 + ˆ1 x
i=1 i=1

zero correlation sample averages


residuals
between residuals of y and x lie on
sum to zero For example, CEO number 12’s salary was $526,023 lower than predicted
and regressors the regression line
using information on the firm’s equity

Simon Kwok ECMT5001 L6 Algebraic properties of OLS 20 / 46 Simon Kwok ECMT5001 L6 Algebraic properties of OLS 21 / 46

Goodness of fit Goodness of fit


Goodness of fit
how well does the explanatory variable explain the dependent variable?
R-squared (or the coefficient of determination)
Decomposing the variation in y
measures the fraction of the variation in y that is explained by the
SST = SSE + SSR regression
SSE SSR
R2 = =1 , 0  R2  1
n
X n
X n
X SST SST
2 2
SST = (yi y) , SSE = (ŷi y) , SSR = ûi2
i=1 i=1 i=1 R 2 = 1 indicates a perfect fit
R 2 = 0 indicates no linear relationship between x and y
residual sum of
explained sum of
total sum of squares: squares: variation
squares: variation ex-
total variation in y not explained
plained by regression
by regression

Simon Kwok ECMT5001 L6 Algebraic properties of OLS 22 / 46 Simon Kwok ECMT5001 L6 Algebraic properties of OLS 23 / 46
OLS examples Outline

CEO salary and return on equity


1 The simple regression model
the regression explains only 1.3% of the total variation in salaries

\ = 963.191 + 18.501roe,
salary n = 209, R 2 = 0.0132 2 Ordinary least squares (OLS)

Voting outcomes and campaign expenditures 3 Algebraic properties of OLS


the regression explains 85.6% of the total variation in election
outcomes 4 Functional form

\ = 26.81 + 0.464shareA,
voteA n = 173, R 2 = 0.856 5 Assumptions of OLS

Caution: a high R 2 does not necessarily mean the regression has a causal 6 Statistical properties of OLS
interpretation

Simon Kwok ECMT5001 L6 Algebraic properties of OLS 24 / 46 Simon Kwok ECMT5001 L6 Functional form 25 / 46

Units of measurement Functional form

Units of measurement are important for the interpretation of regression We can use natural logarithms to examine non-linear relationships
results 1. Semi-logarithmic form
e.g. consider the relationship regression of log wages on years of education

expenditure = 0 + 1 income +u log(wage) = 0 + 1 educ +u

i.e. as household income rises, expenditure on food rises at a rate log(wage) is the natural logarithm of wages
determined by 1 this changes the interpretation of the regression coefficient
Suppose both income and expenditure are measured in dollars @wage
@ log(wage) 1 @wage wage
how would we interpret 1 and 0? 1 = = =
@educ wage @educ @educ
Suppose instead that income is measured in thousands of dollars
i.e. the percentage change in the wage if education is increased by 1
how would we interpret 1 and 0? year

Simon Kwok ECMT5001 L6 Functional form 26 / 46 Simon Kwok ECMT5001 L6 Functional form 27 / 46
Example: wage equation Functional form

2. Log-logarithmic form
wage CEO salary and firm sales
Fitted regression wage growth of 8.3%
per year of education log(salary ) = 0 + 1 log(sales) +u
\ = 0.584 + 0.083educ
log(wage) we take natural logs of both salary and sales
t his again changes the interpretation of the regression coefficient
The wage increases by 8.3%
for each extra year of @salary
@ log(salary ) salary
education 1 = = @sales
@ log(sales) sales
0 educ
i.e. the percentage change in salary if sales increase by 1%
or the elasticity of salary with respect to sales

Simon Kwok ECMT5001 L6 Functional form 28 / 46 Simon Kwok ECMT5001 L6 Functional form 29 / 46

Example: CEO salaries Outline

1 The simple regression model


Fitted regression
2 Ordinary least squares (OLS)
\ ) = 4.822 + 0.257 log(sales)
log(salary
3 Algebraic properties of OLS

i.e. a 1% increase in sales is associated with a 0.257% increase in 4 Functional form


salary

Note that the log-log form suggests a constant elasticity 5 Assumptions of OLS

6 Statistical properties of OLS

Simon Kwok ECMT5001 L6 Functional form 30 / 46 Simon Kwok ECMT5001 L6 Assumptions of OLS 31 / 46
OLS estimates Standard assumptions for linear regression

The estimated regression coefficients are random variables because they Assumption SLR.1 (Linear in parameters)
are calculated from a random sample
the data xi , yi , x, y are random and depend on the sample the population relationship
y= 0 + 1x +u between y and x is linear
Pn
(x x)(yi y )
Pn i
ˆ1 = i=1 , ˆ0 = y ˆ1 x
i=1 (xi x)2 Assumption SLR.2 (Random sampling)

Interpretation
ˆ1 is the sample covariance between x and y divided by the sample the data is a random sample
{(xi , yi ) : i = 1, . . . , n}
drawn from the population
variance of x

Key questions
yi = + therefore each data point fol-
are the estimators unbiased? 0 1 xi + ui
lows the population equation
what are the variances of the estimators?

Simon Kwok ECMT5001 L6 Assumptions of OLS 32 / 46 Simon Kwok ECMT5001 L6 Assumptions of OLS 33 / 46

Random sampling Random sampling


y
Consider the following hypothetical relationship between wages and values drawn
education for ith worker
the population consists of all workers in country A (xi , yi )
there is a linear relationship between wages and years of education in
the population yi
randomly draw a worker from the population ui
the wage and education level of the worker are random because we do
not know in advance which worker will be drawn PRF: E (y |x) = 0 + 1
put the worker back into the population and repeat the random draw
n times u1 deviation from population
y1 relationship for ith worker:
the wages and years of education of the sampled workers are used to
estimate the linear relationship between wages and education u i = yi 0 1 xi

0 x1 xi x
Simon Kwok ECMT5001 L6 Assumptions of OLS 34 / 46 Simon Kwok ECMT5001 L6 Assumptions of OLS 35 / 46
Standard assumptions for linear regression Outline

Assumption SLR.3 (Sample variation in explanatory variable) 1 The simple regression model

n 2 Ordinary least squares (OLS)


X the values of the explanatory
(xi x)2 > 0
variable are not all the same
i=1 3 Algebraic properties of OLS

Assumption SLR.4 (Zero conditional mean) 4 Functional form

the explanatory variable contains


5 Assumptions of OLS
E (u|x) = 0 no information about the mean
of the unobserved factors
6 Statistical properties of OLS

Simon Kwok ECMT5001 L6 Assumptions of OLS 36 / 46 Simon Kwok ECMT5001 L6 Statistical properties of OLS 37 / 46

Statistical properties of OLS Standard assumptions for linear regression

Variance of the OLS estimators


Theorem (Unbiasedness of OLS) how far can we expect our estimates to be from the population values
If assumptions SLR.1 - SLR.4 hold then on average?
sampling variability is measured by the variances of the estimators
E ( ˆ0 ) = 0, E ( ˆ1 ) = 1
Var ( ˆ0 ), Var ( ˆ1 )
Interpretation
in any random sample, the estimated coefficients may be larger or Assumption SLR.5 (Homoskedasticity)
smaller
the explanatory variable contains
on average in repeated samples, they will be equal to the values 2
Var (u|x) = no information about the vari-
determined by the population relationship between x and y
ability of the unobserved factors

Simon Kwok ECMT5001 L6 Statistical properties of OLS 38 / 46 Simon Kwok ECMT5001 L6 Statistical properties of OLS 39 / 46
Homoskedasticity Heteroskedasticity

Simon Kwok ECMT5001 L6 Statistical properties of OLS 40 / 46 Simon Kwok ECMT5001 L6 Statistical properties of OLS 41 / 46

Statistical properties of OLS Estimating the error variance

Theorem (Variance of OLS estimators) the variance of u does not


2
If assumptions SLR.1 - SLR.5 hold then Var (ui |xi ) = = Var (ui ) depend on x; it is equal to
2 2
the unconditional variance
Var ( ˆ1 ) = Pn = ,
i=1 (xi x)2 SSTx
n n
1X 1X 2 this intuitive
˜2 = (ûi û i )2 = ûi
Pn Pn n n
2n 1 2 2n 1 2 estimator is biased!
i=1 xi i=1 xi i=1 i=1
Var ( ˆ0 ) = Pn =
i=1 (xi x)2 SSTx

The sampling variability of the estimated coefficients will be n


X this estimator is
1
larger if the variance of the unobserved factors is higher ˆ2 = ûi2 unbiased; n 2 is the
n 2
i=1 number of degrees of freedom
larger if there is less variability in the explanatory variable

Simon Kwok ECMT5001 L6 Statistical properties of OLS 42 / 46 Simon Kwok ECMT5001 L6 Statistical properties of OLS 43 / 46
Statistical properties of OLS OLS standard errors
Theorem (Unbiasedness of the error variance)
If assumptions SLR.1 - SLR.5 hold then

E (ˆ 2 ) = 2
Calculation of standard errors
1. estimate the regression ! ˆ0 , ˆ1
Estimation of standard errors for regression coefficients
2. calculate the regression residuals: ûi = yi ˆ0 ˆ1 xi
q q Pn
1
d ( ˆ1 ) = ˆ 2 /SSTx
se( ˆ1 ) = Var 3. calculate an estimate of the error variance: ˆ2 = n 2
2
i=1 ûi
v use this information to calculate standard errors ! se( ˆ1 ), se( ˆ0 )
q u n
4.
u X
se( 0 ) = Var ( 0 ) = t ˆ 2 n 1
ˆ d ˆ xi2 /SSTx
i=1

The estimated standard deviations of the regression coefficients are called


standard errors
they measure how precisely the regression coefficients are estimated
Simon Kwok ECMT5001 L6 Statistical properties of OLS 44 / 46 Simon Kwok ECMT5001 L6 Statistical properties of OLS 45 / 46

OLS Summary

1. We seek an explanation for y in terms of x


2. We look for a linear relationship:

y= 0 + 1x +u

3. Given a random sample {xi , yi }ni=1 , we choose the coefficients ˆ0 and


ˆ1 to minimise the sum of squared residuals Pn û 2
i=1 i
4. Under assumptions SLR.1-4, ˆ0 and ˆ1 are unbiased
5. Under assumptions SLR.1-5, ˆ0 and ˆ1 are BLUE

Simon Kwok ECMT5001 L6 Statistical properties of OLS 46 / 46

You might also like