Lecture 4 Linear Regression
Lecture 4 Linear Regression
Models
Dinesh K. Vishwakarma, Ph.D.
1
Learning Objectives
2
Learning Objectives…
7. Correlation Models
8. Link between a correlation model and a
regression model
9. Test of coefficient of Correlation
3
What is a Model?
1. Representation of Some Phenomenon
2. Non-Maths/Stats Model
Weight (pounds)×703
• Non-metric Formula: 𝐵𝑀𝐼 =
(Height in inches)2
6
Probabilistic Models
1. Hypothesize 2 Components
• Deterministic
• Random Error
2. Example: Systolic blood pressure of newborns is 6
Times the Age in days + Random Error
• 𝑆𝐵𝑃 = 6 × 𝑎𝑔𝑒 𝑑 + 𝜀
• Random Error may be due to factors other than
age in days (e.g. Birth weight)
7
Bivariate & multivariate models
Bivariate or simple regression model
(Education) x y (Income) Bivariate
9
Models Facts
10
Thinking Challenge: Which is more
logical?
Grade Grade
1 2
3 4
X X X
r = -1 r = -.6 r=0
Y
Y Y
4 5
5
X X X
r = +1 r = +.3 r=0
Types of Relationship
Y Y
1 2
X X
Y Y
4
3
X X
13
Types of Relationship…
Strong relationships Weak relationships
Y Y
X X
Y Y
X X
14
Types of Relationship…
No relationship
X
Linear Regression Models
A linear regression is one of the easiest
statistical models in machine learning.
16
Types of Regression
Logistic
Types of Basis Linear Regression
Regression
Regression Core The data is modelled
The data is
modelled using a
Concept using a straight line
sigmoid
Linear Regression
Categorical
Used with Continuous Variable
Variable
Logistic Regression
Probability of
Output/Predi
Value of the variable occurrence of an
Polynomial ction
event
Regression Measured by
Accuracy Accuracy,
Measured by loss, R
Stepwise Regression and
Goodness of
squared, Adjusted R
Precision, Recall,
F1 score, ROC
squared etc.
Fit curve, Confusion
Matrix, etc
17
Applications of LR
18
Linear Equations
Y
Y = mX + b
Change
m = Slope in Y
Change in X
b = Y-intercept
X
Linear Regression Model
𝒀𝒊 = 𝜷𝟎 + 𝜷𝟏 𝑿𝒊 + 𝜺𝒊
Sum of squared differences = (2 - 1)2 + (4 - 2)2 +(1.5 - 3)2 + (3.2 - 4)2 = 6.89
Sum of squared differences = (2 -2.5)2 + (4 - 2.5)2 + (1.5 - 2.5)2 + (3.2 - 2.5)2 = 3.99
Y 𝑌𝑖 = 𝛽0 + 𝛽1 𝑋𝑖 + 𝜀𝑖 Observed
value
i = Random error
𝐸(𝑌) = 𝛽0 + 𝛽1 𝑋𝑖
X
Observed value
Simple Linear Regression Model
^i = Random
error
Unsampled
observation
Yˆi ˆ0 ˆ1 X i
Observed value
X
24
Estimating Parameters:
Least Squares Method
25
Scatter plot
Y
60
40
20
0 X
0 20 40 60
EPI 809/Spring 2008
26
Thinking Challenge
20
0 X
0 20 40 60
EPI 809/Spring 2008
27
Least Squares Error
𝑛 2 σ𝑛 2
σ𝑖=1(𝑌𝑖 − 𝑌𝑖 ) = 𝑖=1 𝜀ෝ𝑖
x x
2
SS xx i
Sample Y - intercept
ˆ0 y ˆ1 x
30
Finding 𝜷𝟎 MSE Method
σ𝑛𝑖=1 𝜀𝑖 2 = σ𝑛𝑖=1(𝑦𝑖 − 𝑦ො𝑖 )2 = σ𝑛𝑖=1(𝑦𝑖 − 𝛽0 + 𝛽1 . 𝑥𝑖 )2
𝜕 σ𝑛
𝑖=1(𝑦 𝑖 −𝛽0 +𝛽1 .𝑥𝑖 ) 2
0=
𝜕𝛽0
𝑛 2 2 2 2
𝜕 σ𝑖=1 𝑦𝑖 +𝛽0 +𝛽1 .𝑥𝑖 +2𝛽0 𝛽1 .𝑥𝑖 −2𝑦𝑖 .𝛽0 −2𝑦𝑖 𝛽1 .𝑥𝑖
=0
𝜕𝛽0
33
Computation Table
2 2
Xi Yi Xi Yi XiYi
2 2
X1 Y1 X1 Y1 X1Y1
2 2
X2 Y2 X2 Y2 X2Y2
: : : : :
2 2
Xn Yn Xn Yn XnYn
Xi Yi Xi2
Yi2
XiYi
34
Interpretation of Coefficients
• 1 > 0 Positive Association
• 1 < 0 Negative Association
1 )
Slope (𝛽 • 1 = 0 No Association
Y-Intercept (0)
Average Value of Y When X = 0
• If 0 = 4, then Average Y is expected to be 4
When X is 0EPI 809/Spring 2008 35
E.g. Parameter Estimation
What is the relationship between
Mother’s Estriol level & Birthweight using the
following data?
Estriol Birthweight
(mg/24h) (g/1000)
1 1
2 1
3 2
4 2
5 4
36
Scatterplot
Birthweight vs. Estriol level
Birthweight
4
3
2
1
0
0 1 2 3 4 5 6
Estriol level
37
Parameter Estimation Solution
Table
2 2
Xi Yi Xi Yi XiYi
1 1 1 1 1
2 1 4 1 2
3 2 9 4 6
4 2 16 4 8
5 4 25 16 20
15 10 55 26 37
38
Parameter Estimation Solution
X n
n
i Yi
n
1510
i 1 i 1
X iYi 37
n
ˆ1 i 1
5 0.70
X
n 2
15
2
i 55
n
5
i 1
Xi
2
i 1 n
40
Goodness: Variation Measures
Unexplained sum
𝒚y of squares ( yi yˆi )
2
yi
𝑦ො𝑖 = 𝛽መ0 + 𝛽መ1 𝑥𝑖
Total sum of
squares ( yi y )2 Explained sum of
squares ( yˆi y )2
y
x
xi𝒙𝒊 𝒙
Estimation of σ 2
𝑺𝑺𝑬
𝒔𝟐 = Where 𝑺𝑺𝑬 = σ𝒏𝒊=𝟏( 𝒚𝒊 − 𝒚
ෝ 𝒊 )𝟐
𝒏−𝟐
^
𝛽1 መ
𝛽 11
1
Slope Coefficient Test Statistic
𝟏
𝜷 𝟏
𝜷
𝒕= = , 𝒅𝒇 = 𝟐
𝑺𝜷 𝟏 𝑺
𝑺𝑺𝒙𝒙
𝒏
σ𝒏𝒊=𝟏 𝒙𝒊 𝟐
𝑺𝑺𝒙𝒙 = 𝒙𝒊 −
𝒏
𝒊=𝟏
E.g. Test of Slope Coefficient
You’re a marketing analyst for any Toys.
^ ^
You find β0 = –.1, β1 = .7 and s = .6055.
Ad (₹) Sales (Qty)
1 1
2 1
3 2
4 2
5 4
Is the relationship significant
at the .05 level of significance?
Solution Table
𝒔 𝟎.𝟔𝟎𝟓𝟓
𝑺Type
𝟏 =equation here.
𝜷 = =. 𝟏𝟗𝟏𝟒
𝑺𝑺𝑿𝑿 𝟏𝟓𝟐
𝟓𝟓−
𝟓
𝟎.𝟕𝟎
t= = 𝟑. 𝟔𝟓𝟕
.𝟏𝟗𝟏𝟒
Test of Slope Coefficient
Solution
H0: 1 = 0
Reject H0 Reject H0
Ha: 1 0
.025 .025
.05
df 5 - 2 = 3
-3.182 0 3.182 t
Critical Value(s):
𝜷𝟏 𝟎.𝟕𝟎
Test Statistic: 𝒕 = = = 𝟑. 𝟔𝟓𝟕
𝑺𝜷
.𝟏𝟗𝟏𝟒
𝟏
SS xy
r
SS xx SS yy
x
2
where SS xx x 2
n
y
2
SS yy y 2
n
SS xy xy
x y
n
Correlation Coefficient Values
xi yi xi2 yi2 xi yi
1 1 1 1 1
2 1 4 1 2
3 2 9 4 6
4 2 16 4 8
5 4 25 16 20
15 10 55 26 37
Coefficient of Correlation Solution
x
2
(15) 2
SS xx x 2 55 10
n 5
y
2
2
(10)
SS yy y 2 26 6
n 5
SS xy xy
x y
37
(15)(10)
7
n 5
SS xy 7
r .904
SS xx SS yy 10 6
It can be predicted using LR due
High value of Correlation Coefficient
Coefficient of Correlation Challenge
You’re an economist for the county cooperative.
You gather the following data:
Fertilizer (lb.) Yield (lb.)
4 3.0
6 5.5
10 6.5
12 9.0
Find the coefficient of correlation.
Solution Table*
2 2
xi yi xi yi x i yi
4 3.0 16 9.00 12
6 5.5 36 30.25 33
SS xy xy
x y
218
(32)(24)
26
n 4
SS xy 26
r .956
SS xx SS yy 40 18.5
Coefficient of Determination
Proportion of variation ‘explained’ by relationship
between x and y
Y
r2 = 1
X
r2 =1
E.g. Approximate r 2 Values…
r2 = 0
Y
• No linear relationship between X
and Y:
• The value of Y does not depend
on X. (None of the variation in Y
is explained by variation in X)
X
r2 = 0
E.g. Determination Coefficient
You’re a marketing analyst for any Toys. You
know r = .904.
Ad (₹) Sales (Qty)
1 1
2 1
3 2
4 2
5 4
Calculate and interpret the
coefficient of determination.
E.g. Determination Coefficient
r2 = (coefficient of correlation)2
r2 = (.904)2
r2 = .817