Lecture 6. Linear Regression
Lecture 6. Linear Regression
5 Assumptions of OLS
1
Based on lecture notes by Nicolas de Roos.
Simon Kwok ECMT5001 L6 1 / 46 Simon Kwok ECMT5001 L6 2 / 46
Simon Kwok ECMT5001 L6 3 / 46 Simon Kwok ECMT5001 L6 The simple regression model 4 / 46
The simple regression model The simple regression model
Explains the variable y in terms of the variable x
random y= 0 + 1x +u
y intercept slope
error
Interpretation: studies how y varies with x
y= 0 + 1x +u
dependent independent @y @u
= 1 as long as =0
variable variable @x @x
@y
Terminology @x : by how much does the dependent variable change if the
independent variable increases by 1 unit?
y : dependent variable, explained variable, response variable
the interpretation is correct only if all other things remain equal when
x: independent variable, explanatory variable, regressor
the independent variable is changed
u: error term, disturbance, unobservable, residual
Simon Kwok ECMT5001 L6 The simple regression model 5 / 46 Simon Kwok ECMT5001 L6 The simple regression model 6 / 46
Simon Kwok ECMT5001 L6 The simple regression model 7 / 46 Simon Kwok ECMT5001 L6 The simple regression model 8 / 46
The population regression function The population regression function
E (y |x) = E ( 0 + 1x + u|x)
= 0 + 1x + E (u|x)
= 0 + 1x
Simon Kwok ECMT5001 L6 The simple regression model 9 / 46 Simon Kwok ECMT5001 L6 The simple regression model 10 / 46
{(xi , yi ) : i = 1, . . . , n}
2 Ordinary least squares (OLS)
xi : value of the explanatory variable of the ith observation
3 Algebraic properties of OLS
yi : value of the dependent variable of the ith observation
Simon Kwok ECMT5001 L6 Ordinary least squares (OLS) 11 / 46 Simon Kwok ECMT5001 L6 Ordinary least squares (OLS) 12 / 46
The regression objective Ordinary least squares
Fit a regression line as well as possible through the data:
ŷ = ˆ0 + ˆ1 x Regression residuals
ûi = yi ŷi = yi ˆ0 ˆ1 xi
Simon Kwok ECMT5001 L6 Ordinary least squares (OLS) 13 / 46 Simon Kwok ECMT5001 L6 Ordinary least squares (OLS) 14 / 46
salary = 0 + 1 roe +u
\ = 963.191 + 18.501roe
salary
wage = 0 + 1 educ +u
[ =
wage 0.90 + 0.54educ
Simon Kwok ECMT5001 L6 Ordinary least squares (OLS) 16 / 46 Simon Kwok ECMT5001 L6 Ordinary least squares (OLS) 17 / 46
Simon Kwok ECMT5001 L6 Ordinary least squares (OLS) 18 / 46 Simon Kwok ECMT5001 L6 Algebraic properties of OLS 19 / 46
Algebraic properties of OLS OLS example: CEO salaries
Fitted values and residuals
Simon Kwok ECMT5001 L6 Algebraic properties of OLS 20 / 46 Simon Kwok ECMT5001 L6 Algebraic properties of OLS 21 / 46
Simon Kwok ECMT5001 L6 Algebraic properties of OLS 22 / 46 Simon Kwok ECMT5001 L6 Algebraic properties of OLS 23 / 46
OLS examples Outline
\ = 963.191 + 18.501roe,
salary n = 209, R 2 = 0.0132 2 Ordinary least squares (OLS)
\ = 26.81 + 0.464shareA,
voteA n = 173, R 2 = 0.856 5 Assumptions of OLS
Caution: a high R 2 does not necessarily mean the regression has a causal 6 Statistical properties of OLS
interpretation
Simon Kwok ECMT5001 L6 Algebraic properties of OLS 24 / 46 Simon Kwok ECMT5001 L6 Functional form 25 / 46
Units of measurement are important for the interpretation of regression We can use natural logarithms to examine non-linear relationships
results 1. Semi-logarithmic form
e.g. consider the relationship regression of log wages on years of education
i.e. as household income rises, expenditure on food rises at a rate log(wage) is the natural logarithm of wages
determined by 1 this changes the interpretation of the regression coefficient
Suppose both income and expenditure are measured in dollars @wage
@ log(wage) 1 @wage wage
how would we interpret 1 and 0? 1 = = =
@educ wage @educ @educ
Suppose instead that income is measured in thousands of dollars
i.e. the percentage change in the wage if education is increased by 1
how would we interpret 1 and 0? year
Simon Kwok ECMT5001 L6 Functional form 26 / 46 Simon Kwok ECMT5001 L6 Functional form 27 / 46
Example: wage equation Functional form
2. Log-logarithmic form
wage CEO salary and firm sales
Fitted regression wage growth of 8.3%
per year of education log(salary ) = 0 + 1 log(sales) +u
\ = 0.584 + 0.083educ
log(wage) we take natural logs of both salary and sales
t his again changes the interpretation of the regression coefficient
The wage increases by 8.3%
for each extra year of @salary
@ log(salary ) salary
education 1 = = @sales
@ log(sales) sales
0 educ
i.e. the percentage change in salary if sales increase by 1%
or the elasticity of salary with respect to sales
Simon Kwok ECMT5001 L6 Functional form 28 / 46 Simon Kwok ECMT5001 L6 Functional form 29 / 46
Note that the log-log form suggests a constant elasticity 5 Assumptions of OLS
Simon Kwok ECMT5001 L6 Functional form 30 / 46 Simon Kwok ECMT5001 L6 Assumptions of OLS 31 / 46
OLS estimates Standard assumptions for linear regression
The estimated regression coefficients are random variables because they Assumption SLR.1 (Linear in parameters)
are calculated from a random sample
the data xi , yi , x, y are random and depend on the sample the population relationship
y= 0 + 1x +u between y and x is linear
Pn
(x x)(yi y )
Pn i
ˆ1 = i=1 , ˆ0 = y ˆ1 x
i=1 (xi x)2 Assumption SLR.2 (Random sampling)
Interpretation
ˆ1 is the sample covariance between x and y divided by the sample the data is a random sample
{(xi , yi ) : i = 1, . . . , n}
drawn from the population
variance of x
Key questions
yi = + therefore each data point fol-
are the estimators unbiased? 0 1 xi + ui
lows the population equation
what are the variances of the estimators?
Simon Kwok ECMT5001 L6 Assumptions of OLS 32 / 46 Simon Kwok ECMT5001 L6 Assumptions of OLS 33 / 46
0 x1 xi x
Simon Kwok ECMT5001 L6 Assumptions of OLS 34 / 46 Simon Kwok ECMT5001 L6 Assumptions of OLS 35 / 46
Standard assumptions for linear regression Outline
Assumption SLR.3 (Sample variation in explanatory variable) 1 The simple regression model
Simon Kwok ECMT5001 L6 Assumptions of OLS 36 / 46 Simon Kwok ECMT5001 L6 Statistical properties of OLS 37 / 46
Simon Kwok ECMT5001 L6 Statistical properties of OLS 38 / 46 Simon Kwok ECMT5001 L6 Statistical properties of OLS 39 / 46
Homoskedasticity Heteroskedasticity
Simon Kwok ECMT5001 L6 Statistical properties of OLS 40 / 46 Simon Kwok ECMT5001 L6 Statistical properties of OLS 41 / 46
Simon Kwok ECMT5001 L6 Statistical properties of OLS 42 / 46 Simon Kwok ECMT5001 L6 Statistical properties of OLS 43 / 46
Statistical properties of OLS OLS standard errors
Theorem (Unbiasedness of the error variance)
If assumptions SLR.1 - SLR.5 hold then
E (ˆ 2 ) = 2
Calculation of standard errors
1. estimate the regression ! ˆ0 , ˆ1
Estimation of standard errors for regression coefficients
2. calculate the regression residuals: ûi = yi ˆ0 ˆ1 xi
q q Pn
1
d ( ˆ1 ) = ˆ 2 /SSTx
se( ˆ1 ) = Var 3. calculate an estimate of the error variance: ˆ2 = n 2
2
i=1 ûi
v use this information to calculate standard errors ! se( ˆ1 ), se( ˆ0 )
q u n
4.
u X
se( 0 ) = Var ( 0 ) = t ˆ 2 n 1
ˆ d ˆ xi2 /SSTx
i=1
OLS Summary
y= 0 + 1x +u