Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
4 views

Simple Linear Regression - Lecture Notes

Uploaded by

Shruti Mittal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Simple Linear Regression - Lecture Notes

Uploaded by

Shruti Mittal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Econometrics for Business I

BSE3703
Topic 2
Simple Linear Regression
Learning Outcomes

At the end of the lesson, students must be able to:


1. understand the concept and applications of simple linear regression.
2. define the statistical terms in simple linear regression.
3. derive and compute the OLS estimators for simple linear regression.
4. compute the R2 to measure and interpret the goodness of fit of the OLS regression line.
5. compute the SER to measure and interpret the goodness of fit of the OLS regression line.
6. apply the simple linear regression model to explain (or predict) the dependent variable.
Simple Linear Regression: Real Applications

Suppose LTA implements increase ERP charges by $1.00, what would be the quantified effect on road usage during peak hours?

Suppose NUS increase the minimum weightage of final exam by 20%, what would be the quantified effect on the proportion of A graders?

Suppose MOM implements a minimum wage law, what would be the quantified effect on the unemployment rate of low income workers?

Simple Linear Regression is a statistical technique which is used to quantify the effect of
one or more variable(s) on a variable of interest.
Simple Linear Regression: Introduction

Simple Linear regression is a statistical technique which is used to quantify the effect of
one or more variable(s) on a variable of interest.
direction and magnitude

independent variables dependent variable


or explanatory variables or response variable

Linear regression postulates a linear relationship between a dependent variable (y) and its
respective independent variables (x1 , x2 , … , xk ).
Simple Linear Regression: Equation

Linear regression postulates a linear relationship between a dependent variable (y) and its
respective independent variables (x1 , x2 , … , xk ).

y = β0 + β1 x1 + β2 x2 + ⋯ + βk xk + u

Consider the simplified (simple) case of one independent variable:

y = β0 + β1 x
where y is the dependent variable
x is the independent variable
β0 and β1 are population parameters

β1 is the change in y when the value of x changes by one unit


β0 is the value of y when the value of x is zero
Simple Linear Regression: Illustration
Draw a sample of size n
Context: Suppose the Dean of NUS Business School wants to improve the school test score. Test Score Class Size
Theory: School test score is affected by class size. Class 1 86 11
Decision: Should he hire more lecturers to cut class size but will increase cost? Class 2 68 28
Class 3 73 22
Test Score = β0 + β1 Class Size …
Class n 83 19

Test Score

y1 = 86
yn = 83

y2 = 68

Class Size
x1 = 11 xn = 19 x2 = 28
Simple Linear Regression: Illustration
Draw a sample of size n
Context: Suppose the Dean of NUS Business School wants to improve the school test score. Test Score Class Size
Theory: School test score is affected by class size. Class 1 86 11
Decision: Should he hire more lecturers to cut class size but will increase cost? Class 2 68 28
Class 3 73 22
Test Score = β0 + β1 Class Size …
Class n 83 19
𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙
yi = β0 + β1 xi + ui (1) 𝑑𝑖𝑠𝑡𝑢𝑟𝑏𝑎𝑛𝑐𝑒 𝑡𝑒𝑟𝑚
Test Score
𝑒𝑟𝑟𝑜𝑟 𝑡𝑒𝑟𝑚
yො i = β෠ 0 + β෠ 1 xi (2)
y1 = 86
uො 1 = 𝑦1 − yො1
yi = yො i + uො i (3) yො1

yi = β෠ 0 + β෠ 1 xi + uො i (4)
Estimated Test Score = β෠ 0 + β෠ 1 Class Size
where i = 1, 2, 3, … , n
yො = β෠ 0 + β෠ 1 x

Class Size
x1 = 11
Simple Linear Regression: Illustration
Context: Suppose the Dean of NUS Business School wants to improve the school test score.
Theory: School test score is affected by class size.
Decision: Should he hire more lecturers to cut class size but will increase cost?

Test Score = β0 + β1 Class Size

yi = β0 + β1 xi + ui (1) yi = β0 + β1 xi + ui (𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛)


where yi is the dependent variarble (what we are trying to explain or estimate)
yො i = β෠ 0 + β෠ 1 xi (2) xi is the independent variarble (what explains the variation in the dependent variable)
ui is the error term or disturbance term in the population regression function
yi = yො i + uො i (3) β0 and β1 are population parameters

yi = β෠ 0 + β෠ 1 xi + uො i (4) yi = β෠ 0 + β෠ 1 xi + uො i (𝑠𝑎𝑚𝑝𝑙𝑒 𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛)


where yො i is the estimated dependent variable
where i = 1, 2, 3, … , n uො i is the error term or disturbance term in the sample regression function
β෠ 0 and β෠ 1 are sample estimators (for the population parameters)
Ordinary Least Squares (OLS) Estimators
Residual: uො i = yi − yො i
= yi − (β෠ 0 + β෠ 1 xi ) since yො i = β෠ 0 + β෠ 1 xi
= yi − β෠ 0 − β෠ 1 xi

2
Squared Residual: uො i 2 = yi − β෠ 0 − β෠ 1 xi

𝑛 𝑛
2
Sum of Squared Residuals (RSS): ෍ uො i 2 = ෍ yi − β෠ 0 − β෠ 1 xi
𝑖=1 𝑖=1

𝑛
2
Minimized Sum of Squared Residuals: min ෍ yi − β෠ 0 − β෠ 1 xi w. r. t β෠ 0 and β෠ 1
𝑖=1
Ordinary Least Squares (OLS) Estimators
2
min RSS = ෍ yi − β෠ 0 − β෠ 1 xi
෡ 0 ,β
wrt β ෡1

𝜕RSS
= −2 ෍ yi − β෠ 0 − β෠ 1 xi = 0 (1)
𝜕β෠ 0

𝜕RSS
= −2 ෍ xi yi − β෠ 0 − β෠ 1 xi = 0 (2)
𝜕β෠ 1

From (1) −2 ෍ yi − β෠ 0 − β෠ 1 xi = 0 Subst. (3) into (2)

−2 ෍ xi yi − yത + β෠ 1 xത − β෠ 1 xi = 0
−2 ෍ yi + 2nβ෠ 0 + 2β෠ 1 ෍ xi = 0

σ xi σ yi ෍ xi yi − yത ෍ xi + β෠ 1 xത ෍ xi − β෠ 1 ෍ xi2 = 0
Given xത = and yത =
n n

−2nതy + 2nβ෠ 0 + 2β෠ 1 nതx = 0 ෍ xi yi − nതxyത + β෠ 1 nതx 2 − β෠ 1 ෍ xi2 = 0

σ xi yi − nതxyത
β෠ 0 = yത − β෠ 1 xത (3) β෠ 1 =
σ xi2 − nതx 2
Goodness of Fit

▪ The OLS method fits a regression curve to a scatter diagram.

▪ But we also need to know how close it is to the scattered observed values to judge whether the line does a
good job at describing the relationship between Y and X.

▪ We want a measure that describes the closeness of fit.

▪ This will inform us how well the equation we have obtained accounts for the behavior of the dependent
variable.

▪ The R2 and SER (standard error of the regression) measure how well the OLS regression line fits the data.
R Squared (Coefficient of Determination)
▪ R2 is the fraction of the sample variance of yi explained (or predicted) by xi where (0 ≤ R2 ≤ 1)

▪ Mathematically, R2 can be written as the ratio of the explained sum of squares (ESS) to the
total sum of squares (TSS)

2
ESS σ yො i − yത ➢ ESS is the sum of squared deviations of the predicted values of y from sample mean.
R2 = = 2 ➢ TSS is the sum of squared deviations of the actual values of y from sample mean.
TSS σ yi − yത

▪ Alternatively, R2 can be written as the fraction of the variance in yi not explained by xi (RSS)

RSS σ uො 2i
R2 =1− =1− 2 where RSS = σ uො 2i
TSS σ yi − yത
R Squared (Coefficient of Determination)
Draw a sample of size n
Test Score Class Size
➢ ESS is the sum of squared deviations of the predicted values of y from sample mean. Class 1 86 11
➢ TSS is the sum of squared deviations of the actual values of y from sample mean. Class 2 68 28
Class 3 73 22

Class n 83 19

2
ESS σ yො i − yത Test Score
R2 = = 2
TSS σ yi − yത
y1
(𝑦 − 𝑦)

yො1
(𝑦 − 𝑦)

RSS σ uො 2i (𝑦ො − 𝑦)

R2 =1− =1− 2
where RSS = σ uො 2i
TSS σ yi − yത yത

yො = β෠ 0 + β෠ 1 x

Class Size
x1
Proof: TSS = ESS + RSS
n ෡ 𝟎 and 𝛃
෡𝟏)
Recall condition to derive OLS estimators (𝛃
2
TSS = ෍ yi − yത
i=1
2
n min RSS = ෍ yi − β෠ 0 − β෠ 1 xi
෡ 0 ,β
wrt β ෡1
2
TSS = ෍ yො i − yത + yi − yො i
i=1
n 𝜕RSS 𝜕RSS
2 2 = −2 ෍ yi − β෠ 0 − β෠ 1 xi = 0 = −2 ෍ xi yi − β෠ 0 − β෠ 1 xi = 0
TSS = ෍ yො i − yത + yi − yො i + 2 yො i − yത yi − yො i 𝜕β෠ 0 𝜕β෠ 1
i=1
n n n
−2 ෍ uො i = 0 −2 ෍ xi uො i = 0
2 2
TSS = ෍ yො i − yത + ෍ yi − yො i + 2 ෍ yො i − yത yi − yො i
i=1 i=1 i=1
෍ uො i = 0 ෍ xi uො i = 0
n
ESS RSS 2 ෍ β෠ 0 + β෠ 1 xi − yത uො i
i=1
n n n

= 2β෠ 0 ෍ uො i + 2β෠ 1 ෍ xi uො i − 2തy ෍ uො i


i=1 i=1 i=1

=0 =0 =0
Standard Error of Regression (SER)

▪ SER is an estimator of the standard deviation of the regression error term ui

▪ Because the population regression errors are unobserved, they are computed using their
sample counterparts, the OLS residuals uො 1 , uො 2 , … , uො n

RSS σ yi − yො i 2 σ uො 2i
SER = = = where RSS = σ uො 2i
df df df

where df is the degree of freedom of Sum of Squared Residuals (RSS) =n−1−k

number of explanatory variables in the regression model


Simple Linear Regression: Prediction
Context: Suppose the Dean of SMU wants to improve the school test score.
Theory: School test score is affected by class size.
Decision: Should he hire more teachers to cut class size but will increase cost?

Test Score = β0 + β1 Class Size

yi = β0 + β1 xi + ui (1)

yො i = β෠ 0 + β෠ 1 xi (2) yො i = 0.001 − 7.893 xi

yi = yො i + uො i (3)
When class size decreases by 1 student, the test score will increase by 7.893; or
yi = β෠ 0 + β෠ 1 xi + uො i (4) when class size increases by 1 student, the test score will decrease by 7.893.

where i = 1, 2, 3, … , n
Suppose you are a management consultant at McKinsey & Company, Singapore. And a major client engaged you to produce an
outlook for Singapore property market in the medium term. So, you construct a simple linear regression model as follow:

Pt = β෠ 0 + β෠ 1 Yt + uො t using t = 1990, 1991, … , 2020

where Pt is the Singapore’s property price index (2010=100)


Yt is Singapore’s real economic growth rate (in percentage term)
Statistical Findings

෍ Pt = 3345.28 ෍ Yt = 180.67 ෍ uො t 2 = 152.24

ത )2 = 1355.67
෍(Pt − P ത )2 = 431.52
෍(Yt − Y

෍ Pt 2 = 362352.39 ෍ Yt 2 = 1484.46 ෍ Pt Yt = 20243.56

a) Based on your statistical findings, compute β෠ 0 and β෠ 1 .

b) Suppose you derived the following sample linear regression function Pt = 101.19 + 1.462Yt + uො t
i. What would be your predicted property price index in 2022 if the forecast by MAS of a 5.9% economic growth in Singapore in
2022 is true?
ii. If economic growth in Singapore is expected to increase by 2.5 percentage-point in 2022, what would be the estimated effect
on Singapore’s property price index.
c) Compute R2 and the standard error of regression (SER). Interpret the goodness of fit of your estimated OLS regression line.
Prepared by

Daniel SOH

You might also like