Simple Linear Regression - Lecture Notes
Simple Linear Regression - Lecture Notes
BSE3703
Topic 2
Simple Linear Regression
Learning Outcomes
Suppose LTA implements increase ERP charges by $1.00, what would be the quantified effect on road usage during peak hours?
Suppose NUS increase the minimum weightage of final exam by 20%, what would be the quantified effect on the proportion of A graders?
Suppose MOM implements a minimum wage law, what would be the quantified effect on the unemployment rate of low income workers?
Simple Linear Regression is a statistical technique which is used to quantify the effect of
one or more variable(s) on a variable of interest.
Simple Linear Regression: Introduction
Simple Linear regression is a statistical technique which is used to quantify the effect of
one or more variable(s) on a variable of interest.
direction and magnitude
Linear regression postulates a linear relationship between a dependent variable (y) and its
respective independent variables (x1 , x2 , … , xk ).
Simple Linear Regression: Equation
Linear regression postulates a linear relationship between a dependent variable (y) and its
respective independent variables (x1 , x2 , … , xk ).
y = β0 + β1 x1 + β2 x2 + ⋯ + βk xk + u
y = β0 + β1 x
where y is the dependent variable
x is the independent variable
β0 and β1 are population parameters
Test Score
y1 = 86
yn = 83
y2 = 68
Class Size
x1 = 11 xn = 19 x2 = 28
Simple Linear Regression: Illustration
Draw a sample of size n
Context: Suppose the Dean of NUS Business School wants to improve the school test score. Test Score Class Size
Theory: School test score is affected by class size. Class 1 86 11
Decision: Should he hire more lecturers to cut class size but will increase cost? Class 2 68 28
Class 3 73 22
Test Score = β0 + β1 Class Size …
Class n 83 19
𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙
yi = β0 + β1 xi + ui (1) 𝑑𝑖𝑠𝑡𝑢𝑟𝑏𝑎𝑛𝑐𝑒 𝑡𝑒𝑟𝑚
Test Score
𝑒𝑟𝑟𝑜𝑟 𝑡𝑒𝑟𝑚
yො i = β 0 + β 1 xi (2)
y1 = 86
uො 1 = 𝑦1 − yො1
yi = yො i + uො i (3) yො1
yi = β 0 + β 1 xi + uො i (4)
Estimated Test Score = β 0 + β 1 Class Size
where i = 1, 2, 3, … , n
yො = β 0 + β 1 x
Class Size
x1 = 11
Simple Linear Regression: Illustration
Context: Suppose the Dean of NUS Business School wants to improve the school test score.
Theory: School test score is affected by class size.
Decision: Should he hire more lecturers to cut class size but will increase cost?
2
Squared Residual: uො i 2 = yi − β 0 − β 1 xi
𝑛 𝑛
2
Sum of Squared Residuals (RSS): uො i 2 = yi − β 0 − β 1 xi
𝑖=1 𝑖=1
𝑛
2
Minimized Sum of Squared Residuals: min yi − β 0 − β 1 xi w. r. t β 0 and β 1
𝑖=1
Ordinary Least Squares (OLS) Estimators
2
min RSS = yi − β 0 − β 1 xi
0 ,β
wrt β 1
𝜕RSS
= −2 yi − β 0 − β 1 xi = 0 (1)
𝜕β 0
𝜕RSS
= −2 xi yi − β 0 − β 1 xi = 0 (2)
𝜕β 1
−2 xi yi − yത + β 1 xത − β 1 xi = 0
−2 yi + 2nβ 0 + 2β 1 xi = 0
σ xi σ yi xi yi − yത xi + β 1 xത xi − β 1 xi2 = 0
Given xത = and yത =
n n
σ xi yi − nതxyത
β 0 = yത − β 1 xത (3) β 1 =
σ xi2 − nതx 2
Goodness of Fit
▪ But we also need to know how close it is to the scattered observed values to judge whether the line does a
good job at describing the relationship between Y and X.
▪ This will inform us how well the equation we have obtained accounts for the behavior of the dependent
variable.
▪ The R2 and SER (standard error of the regression) measure how well the OLS regression line fits the data.
R Squared (Coefficient of Determination)
▪ R2 is the fraction of the sample variance of yi explained (or predicted) by xi where (0 ≤ R2 ≤ 1)
▪ Mathematically, R2 can be written as the ratio of the explained sum of squares (ESS) to the
total sum of squares (TSS)
2
ESS σ yො i − yത ➢ ESS is the sum of squared deviations of the predicted values of y from sample mean.
R2 = = 2 ➢ TSS is the sum of squared deviations of the actual values of y from sample mean.
TSS σ yi − yത
▪ Alternatively, R2 can be written as the fraction of the variance in yi not explained by xi (RSS)
RSS σ uො 2i
R2 =1− =1− 2 where RSS = σ uො 2i
TSS σ yi − yത
R Squared (Coefficient of Determination)
Draw a sample of size n
Test Score Class Size
➢ ESS is the sum of squared deviations of the predicted values of y from sample mean. Class 1 86 11
➢ TSS is the sum of squared deviations of the actual values of y from sample mean. Class 2 68 28
Class 3 73 22
…
Class n 83 19
2
ESS σ yො i − yത Test Score
R2 = = 2
TSS σ yi − yത
y1
(𝑦 − 𝑦)
ො
yො1
(𝑦 − 𝑦)
ത
RSS σ uො 2i (𝑦ො − 𝑦)
ത
R2 =1− =1− 2
where RSS = σ uො 2i
TSS σ yi − yത yത
yො = β 0 + β 1 x
Class Size
x1
Proof: TSS = ESS + RSS
n 𝟎 and 𝛃
𝟏)
Recall condition to derive OLS estimators (𝛃
2
TSS = yi − yത
i=1
2
n min RSS = yi − β 0 − β 1 xi
0 ,β
wrt β 1
2
TSS = yො i − yത + yi − yො i
i=1
n 𝜕RSS 𝜕RSS
2 2 = −2 yi − β 0 − β 1 xi = 0 = −2 xi yi − β 0 − β 1 xi = 0
TSS = yො i − yത + yi − yො i + 2 yො i − yത yi − yො i 𝜕β 0 𝜕β 1
i=1
n n n
−2 uො i = 0 −2 xi uො i = 0
2 2
TSS = yො i − yത + yi − yො i + 2 yො i − yത yi − yො i
i=1 i=1 i=1
uො i = 0 xi uො i = 0
n
ESS RSS 2 β 0 + β 1 xi − yത uො i
i=1
n n n
=0 =0 =0
Standard Error of Regression (SER)
▪ Because the population regression errors are unobserved, they are computed using their
sample counterparts, the OLS residuals uො 1 , uො 2 , … , uො n
RSS σ yi − yො i 2 σ uො 2i
SER = = = where RSS = σ uො 2i
df df df
yi = β0 + β1 xi + ui (1)
yi = yො i + uො i (3)
When class size decreases by 1 student, the test score will increase by 7.893; or
yi = β 0 + β 1 xi + uො i (4) when class size increases by 1 student, the test score will decrease by 7.893.
where i = 1, 2, 3, … , n
Suppose you are a management consultant at McKinsey & Company, Singapore. And a major client engaged you to produce an
outlook for Singapore property market in the medium term. So, you construct a simple linear regression model as follow:
ത )2 = 1355.67
(Pt − P ത )2 = 431.52
(Yt − Y
b) Suppose you derived the following sample linear regression function Pt = 101.19 + 1.462Yt + uො t
i. What would be your predicted property price index in 2022 if the forecast by MAS of a 5.9% economic growth in Singapore in
2022 is true?
ii. If economic growth in Singapore is expected to increase by 2.5 percentage-point in 2022, what would be the estimated effect
on Singapore’s property price index.
c) Compute R2 and the standard error of regression (SER). Interpret the goodness of fit of your estimated OLS regression line.
Prepared by
Daniel SOH