Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
46 views

Linear Regression

Linear regression is a machine learning algorithm used for predictive modeling problems. It finds the linear relationship between a dependent variable (y) and one or more independent variables (x). The goal is to fit a linear equation to the observed data so that it can be used to predict the values of y for given values of x. The linear regression line is the line of best fit that minimizes the sum of the squared residuals between the observed and predicted y-values. Model parameters are estimated using the method of least squares.

Uploaded by

Prashant Sahu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views

Linear Regression

Linear regression is a machine learning algorithm used for predictive modeling problems. It finds the linear relationship between a dependent variable (y) and one or more independent variables (x). The goal is to fit a linear equation to the observed data so that it can be used to predict the values of y for given values of x. The linear regression line is the line of best fit that minimizes the sum of the squared residuals between the observed and predicted y-values. Model parameters are estimated using the method of least squares.

Uploaded by

Prashant Sahu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Linear Regression

Machine Learning Algorithm


• Supervised Machine learning Algorithm
• With given set of (xi,yi) [input features and output variable] for a given
problem statement ; supervised model can be built using any one of the
following way:
• Classification
• Regression
• Technique implemented depends on type of problem
• If requirement is to categorize input data into fixed number given classes:
classification is used
• If requirement is to predict a numerical value : regression model is used
Machine learning model
Data set is divided into training and testing part in 80:20 ot 70:30 ratio. Training data with (xi,yi) is fed
into model for training. Once model is trained its performance is evaluated using test data. Xi of test
data is given to trained model and predicted output y’ is obtained. Model is tuned to reduced the
difference between y and y’
Linear Regression
• What is Linear Regression?
• Linear Regression is used for predictive analysis. It is a technique
which explains the degree of relationship between two or more
variables (multiple regression, in that case) using a best fit line / plane.
• Simple Linear Regression is used when we have, one independent
variable (X) and one dependent variable (Y).
• Regression technique tries to fit a single line through a scatter plot
The simplest form of regression with one dependent and one
independent variable is defined by the formula:
• Y = aX + b
Regression
• In regression the output is continuous
• objective is to fit the line which best represents data points in problem space.
• Each estimated line represents a hypothesis function, try to obtain best hypothesis by reducing error
• Many models could be used – Simplest is linear regression
• Fit data with the best hyper-plane which "goes through" the points

y
dependent
variable
(output)

x – independent variable (input)


Linear Regression models
• ERROR: We may use a loss function Given an input x compute an output y
that measures the squared error in For example:
the prediction of y(x) from x. - Predict height from age
- Predict house price from house area
- Predict distance from wall from sensors

X
Types of Regression Models
Regression
1 feature Models 2+ features

Simple Multiple

Non- Non-
Linear Linear
Linear Linear
Simple Linear Regression Equation
E(y)

Regression line

Intercept
Slope β1
b0

x
Linear Regression Model
• Relationship Between Variables Is a Linear Function

Population Population Random


Y-Intercept Slope Error

Y=𝛽0 + 𝛽1 𝑥1 + 𝜖
House Number Y: Actual Selling X: House Size (100s
Price ft2)
1 89.5 20.0
2 79.9 14.8
3 83.1 20.5 Sample 15
4 56.9 12.5 houses
5 66.6 18.0 from the
6 82.5 14.3 region.
7 126.3 27.5
8 79.3 16.5
9 119.9 24.3
10 87.6 20.2
11 112.6 22.0
12 120.8 .019
13 78.5 12.3
14 74.3 14.0
15 74.8 16.7
Averages 88.84 18.17
House price vs size
Linear Regression – Multiple Variables
Yi = b0 + b1X1 + b2 X2 + + bp Xp +e

• b0 is the intercept (i.e. the average value for Y if all


the X’s are zero), bj is the slope for the jth variable Xj

14
Regression Model
• Our model assumes that
E(Y | X = x) = b0 + b1x (the “population line”)

Population Yi = b0 + b1X1 + b2 X2 + + bp Xp +e
line

Yˆi = b̂0 + b̂1 X1 + b̂2 X2 +


Least Squares
line
+ b̂ p X p

We use 𝛽 ෢0 through 𝛽 ෢𝑝 as guesses for b0 through bp


and 𝑌෡𝑖 as a guess for Yi. The guesses will not be perfect.
Assumption
• The data may not form a perfect line.
• When we actually take a measurement (i.e., observe the data), we
observe:
Yi = b0 + b1Xi + i,
where i is the random error associated with the ith observation.
The regression line
The least-squares regression line is the unique line such
that the sum of the squared vertical (y) distances between
the data points and the line is the smallest possible.
Criterion for choosing what line to draw:
method of least squares
෢0
• The method of least squares chooses the line ( 𝛽
and 𝛽෢1 ) that makes the sum of squares of the
2
σ
residuals ℇ𝑖 as small as possible
• Minimizes
n

 i 0 1i
[ y
i =1
− (b + b x )]2

for the given observations ( xi , yi )


How do we "learn" parameters
• For the 2-d problem
Y = b0 + b1X

• To find the values for the coefficients which minimize the objective function we
take the partial derivates of the objective function (SSE) with respect to the
coefficients. Set these to 0, and solve.

n å xy - å x å y å y - b åx
b1 = b0 =
1

nå x 2 - (å x )
2
n

19
Multiple Linear Regression
Y = b 0 + b1 X 1 + b 2 X 2 +  + b n X n
𝑛

ℎ 𝑥 = ෍ 𝛽𝑖 𝑥𝑖
𝑖=0
• There is a closed form which requires matrix
inversion, etc.
• There are iterative techniques to find weights
• delta rule (also called LMS method) which will update
towards the objective of minimizing the SSE.

20
Linear Regression
𝑛

ℎ 𝑥 = ෍ 𝛽𝑖 𝑥𝑖
𝑖=0
To learn the parameters θ (𝛽𝑖 ) ?
• Make h(x) close to y, for the available training
examples.
• Define a cost function 𝐽 𝜃
1 𝑚 (𝑖) (𝑖) 2
J(θ) = σ𝑖=1(ℎ 𝑥 − 𝑦 )
2
• Find 𝜃 that minimizes J(𝜃).
LMS Algorithm
• Start a search algorithm (e.g. gradient descent algorithm,) with initial guess of 𝜃.
• Repeatedly update 𝜃 to make J(𝜃) smaller, until it converges to minima.
𝜕
βj = βj − 𝛼 𝐽 𝜃
𝜕βj
• J is a convex quadratic function, so has a single global minima. gradient descent
eventually converges at the global minima.
• At each iteration this algorithm takes a step in the direction of steepest descent(-
ve direction of gradient).
LMS Update Rule
• If you have only one training example (𝑥, 𝑦)
𝜕 𝜕 1
J(𝜃) = (ℎ 𝑥 − 𝑦)2
𝜕𝜃 𝜕𝜃𝑗 2
1 𝜕
= 2. (ℎ 𝑥 − 𝑦) (ℎ 𝑥 − 𝑦)
2 𝜕𝜃𝑗
𝑛
𝜕
= ℎ 𝑥 −𝑦 . (෍ 𝜃𝑖 𝑥𝑖 − 𝑦)
𝜕𝜃𝑗
𝑖=0
= (ℎ 𝑥 − 𝑦)𝑥𝑗
• For a single training example, this gives the update
rule:
𝛽𝑗 = 𝛽𝑗 + 𝛼(𝑦 𝑖 − ℎ 𝑥 𝑖 )𝑥𝑗 (𝑖)
m training examples
Repeat until convergence {
𝜃𝑗 ≔ 𝜃𝑗 + 𝛼 σ𝑚
𝑖=1(𝑦
𝑖 −ℎ 𝑥 𝑖 ) 𝑥𝑗 (𝑖)
}

Batch Gradient Descent: looks at every example on


each step.
Stochastic gradient descent
• Repeatedly run through the training set.
• Whenever a training point is encountered, update the parameters
according to the gradient of the error with respect to that training example
only.

Repeat {
for I = 1 to m do
𝜃𝑗 ≔ 𝜃𝑗 + 𝛼(𝑦 𝑖 −ℎ 𝑥 𝑖 )𝑥𝑗 (𝑖) (for every j)
end for
} until convergence
Assumption about linear Regression model
1.There should be a linear and additive relationship between dependent (response) variable
and independent (predictor) variable(s). A linear relationship suggests that a change in
response Y due to one unit change in X¹ is constant, regardless of the value of X¹. An
additive relationship suggests that the effect of X¹ on Y is independent of other variables.
2.There should be no correlation between the residual (error) terms. Absence of
this phenomenon is known as Autocorrelation.
3.The independent variables should not be correlated. Absence of this phenomenon is
known as multicollinearity.
4.The error terms must have constant variance. This phenomenon is known as
homoskedasticity. The presence of non-constant variance is referred to
heteroskedasticity.
5. The error terms must be normally distributed.

You might also like