Linear Regression
Linear Regression
y
dependent
variable
(output)
X
Types of Regression Models
Regression
1 feature Models 2+ features
Simple Multiple
Non- Non-
Linear Linear
Linear Linear
Simple Linear Regression Equation
E(y)
Regression line
Intercept
Slope β1
b0
x
Linear Regression Model
• Relationship Between Variables Is a Linear Function
Y=𝛽0 + 𝛽1 𝑥1 + 𝜖
House Number Y: Actual Selling X: House Size (100s
Price ft2)
1 89.5 20.0
2 79.9 14.8
3 83.1 20.5 Sample 15
4 56.9 12.5 houses
5 66.6 18.0 from the
6 82.5 14.3 region.
7 126.3 27.5
8 79.3 16.5
9 119.9 24.3
10 87.6 20.2
11 112.6 22.0
12 120.8 .019
13 78.5 12.3
14 74.3 14.0
15 74.8 16.7
Averages 88.84 18.17
House price vs size
Linear Regression – Multiple Variables
Yi = b0 + b1X1 + b2 X2 + + bp Xp +e
14
Regression Model
• Our model assumes that
E(Y | X = x) = b0 + b1x (the “population line”)
Population Yi = b0 + b1X1 + b2 X2 + + bp Xp +e
line
i 0 1i
[ y
i =1
− (b + b x )]2
• To find the values for the coefficients which minimize the objective function we
take the partial derivates of the objective function (SSE) with respect to the
coefficients. Set these to 0, and solve.
n å xy - å x å y å y - b åx
b1 = b0 =
1
nå x 2 - (å x )
2
n
19
Multiple Linear Regression
Y = b 0 + b1 X 1 + b 2 X 2 + + b n X n
𝑛
ℎ 𝑥 = 𝛽𝑖 𝑥𝑖
𝑖=0
• There is a closed form which requires matrix
inversion, etc.
• There are iterative techniques to find weights
• delta rule (also called LMS method) which will update
towards the objective of minimizing the SSE.
20
Linear Regression
𝑛
ℎ 𝑥 = 𝛽𝑖 𝑥𝑖
𝑖=0
To learn the parameters θ (𝛽𝑖 ) ?
• Make h(x) close to y, for the available training
examples.
• Define a cost function 𝐽 𝜃
1 𝑚 (𝑖) (𝑖) 2
J(θ) = σ𝑖=1(ℎ 𝑥 − 𝑦 )
2
• Find 𝜃 that minimizes J(𝜃).
LMS Algorithm
• Start a search algorithm (e.g. gradient descent algorithm,) with initial guess of 𝜃.
• Repeatedly update 𝜃 to make J(𝜃) smaller, until it converges to minima.
𝜕
βj = βj − 𝛼 𝐽 𝜃
𝜕βj
• J is a convex quadratic function, so has a single global minima. gradient descent
eventually converges at the global minima.
• At each iteration this algorithm takes a step in the direction of steepest descent(-
ve direction of gradient).
LMS Update Rule
• If you have only one training example (𝑥, 𝑦)
𝜕 𝜕 1
J(𝜃) = (ℎ 𝑥 − 𝑦)2
𝜕𝜃 𝜕𝜃𝑗 2
1 𝜕
= 2. (ℎ 𝑥 − 𝑦) (ℎ 𝑥 − 𝑦)
2 𝜕𝜃𝑗
𝑛
𝜕
= ℎ 𝑥 −𝑦 . ( 𝜃𝑖 𝑥𝑖 − 𝑦)
𝜕𝜃𝑗
𝑖=0
= (ℎ 𝑥 − 𝑦)𝑥𝑗
• For a single training example, this gives the update
rule:
𝛽𝑗 = 𝛽𝑗 + 𝛼(𝑦 𝑖 − ℎ 𝑥 𝑖 )𝑥𝑗 (𝑖)
m training examples
Repeat until convergence {
𝜃𝑗 ≔ 𝜃𝑗 + 𝛼 σ𝑚
𝑖=1(𝑦
𝑖 −ℎ 𝑥 𝑖 ) 𝑥𝑗 (𝑖)
}
Repeat {
for I = 1 to m do
𝜃𝑗 ≔ 𝜃𝑗 + 𝛼(𝑦 𝑖 −ℎ 𝑥 𝑖 )𝑥𝑗 (𝑖) (for every j)
end for
} until convergence
Assumption about linear Regression model
1.There should be a linear and additive relationship between dependent (response) variable
and independent (predictor) variable(s). A linear relationship suggests that a change in
response Y due to one unit change in X¹ is constant, regardless of the value of X¹. An
additive relationship suggests that the effect of X¹ on Y is independent of other variables.
2.There should be no correlation between the residual (error) terms. Absence of
this phenomenon is known as Autocorrelation.
3.The independent variables should not be correlated. Absence of this phenomenon is
known as multicollinearity.
4.The error terms must have constant variance. This phenomenon is known as
homoskedasticity. The presence of non-constant variance is referred to
heteroskedasticity.
5. The error terms must be normally distributed.