Linear Regression

Linear regression is a machine learning algorithm used for predictive modeling problems. It finds the linear relationship between a dependent variable (y) and one or more independent variables (x). The goal is to fit a linear equation to the observed data so that it can be used to predict the values of y for given values of x. The linear regression line is the line of best fit that minimizes the sum of the squared residuals between the observed and predicted y-values. Model parameters are estimated using the method of least squares.

Uploaded by

Prashant Sahu

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views

Linear Regression

Uploaded by

Prashant Sahu

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Linear Regression

Machine Learning Algorithm

• Supervised Machine learning Algorithm
• With given set of (xi,yi) [input features and output variable] for a given
problem statement ; supervised model can be built using any one of the
following way:
• Classification
• Regression
• Technique implemented depends on type of problem
• If requirement is to categorize input data into fixed number given classes:
classification is used
• If requirement is to predict a numerical value : regression model is used
Machine learning model
Data set is divided into training and testing part in 80:20 ot 70:30 ratio. Training data with (xi,yi) is fed
into model for training. Once model is trained its performance is evaluated using test data. Xi of test
data is given to trained model and predicted output y’ is obtained. Model is tuned to reduced the
difference between y and y’
Linear Regression
• What is Linear Regression?
• Linear Regression is used for predictive analysis. It is a technique
which explains the degree of relationship between two or more
variables (multiple regression, in that case) using a best fit line / plane.
• Simple Linear Regression is used when we have, one independent
variable (X) and one dependent variable (Y).
• Regression technique tries to fit a single line through a scatter plot
The simplest form of regression with one dependent and one
independent variable is defined by the formula:
• Y = aX + b
Regression
• In regression the output is continuous
• objective is to fit the line which best represents data points in problem space.
• Each estimated line represents a hypothesis function, try to obtain best hypothesis by reducing error
• Many models could be used – Simplest is linear regression
• Fit data with the best hyper-plane which "goes through" the points

y
dependent
variable
(output)

x – independent variable (input)

Linear Regression models
• ERROR: We may use a loss function Given an input x compute an output y
that measures the squared error in For example:
the prediction of y(x) from x. - Predict height from age
- Predict house price from house area
- Predict distance from wall from sensors

X
Types of Regression Models
Regression
1 feature Models 2+ features

Simple Multiple

Non- Non-
Linear Linear
Linear Linear
Simple Linear Regression Equation
E(y)

Regression line

Intercept
Slope β1
b0

x
Linear Regression Model
• Relationship Between Variables Is a Linear Function

Population Population Random

Y-Intercept Slope Error

Y=𝛽0 + 𝛽1 𝑥1 + 𝜖
House Number Y: Actual Selling X: House Size (100s
Price ft2)
1 89.5 20.0
2 79.9 14.8
3 83.1 20.5 Sample 15
4 56.9 12.5 houses
5 66.6 18.0 from the
6 82.5 14.3 region.
7 126.3 27.5
8 79.3 16.5
9 119.9 24.3
10 87.6 20.2
11 112.6 22.0
12 120.8 .019
13 78.5 12.3
14 74.3 14.0
15 74.8 16.7
Averages 88.84 18.17
House price vs size
Linear Regression – Multiple Variables
Yi = b0 + b1X1 + b2 X2 + + bp Xp +e

• b0 is the intercept (i.e. the average value for Y if all

the X’s are zero), bj is the slope for the jth variable Xj

14
Regression Model
• Our model assumes that
E(Y | X = x) = b0 + b1x (the “population line”)

Population Yi = b0 + b1X1 + b2 X2 + + bp Xp +e
line

Yˆi = b̂0 + b̂1 X1 + b̂2 X2 +

Least Squares
line
+ b̂ p X p

We use 𝛽 ෢0 through 𝛽 ෢𝑝 as guesses for b0 through bp

and 𝑌෡𝑖 as a guess for Yi. The guesses will not be perfect.
Assumption
• The data may not form a perfect line.
• When we actually take a measurement (i.e., observe the data), we
observe:
Yi = b0 + b1Xi + i,
where i is the random error associated with the ith observation.
The regression line
The least-squares regression line is the unique line such
that the sum of the squared vertical (y) distances between
the data points and the line is the smallest possible.
Criterion for choosing what line to draw:
method of least squares
෢0
• The method of least squares chooses the line ( 𝛽
and 𝛽෢1 ) that makes the sum of squares of the
2
σ
residuals ℇ𝑖 as small as possible
• Minimizes
n

 i 0 1i
[ y
i =1
− (b + b x )]2

for the given observations ( xi , yi )

How do we "learn" parameters
• For the 2-d problem
Y = b0 + b1X

• To find the values for the coefficients which minimize the objective function we
take the partial derivates of the objective function (SSE) with respect to the
coefficients. Set these to 0, and solve.

n å xy - å x å y å y - b åx
b1 = b0 =
1

nå x 2 - (å x )
2
n

19
Multiple Linear Regression
Y = b 0 + b1 X 1 + b 2 X 2 +  + b n X n
𝑛

ℎ 𝑥 = ෍ 𝛽𝑖 𝑥𝑖
𝑖=0
• There is a closed form which requires matrix
inversion, etc.
• There are iterative techniques to find weights
• delta rule (also called LMS method) which will update
towards the objective of minimizing the SSE.

20
Linear Regression
𝑛

ℎ 𝑥 = ෍ 𝛽𝑖 𝑥𝑖
𝑖=0
To learn the parameters θ (𝛽𝑖 ) ?
• Make h(x) close to y, for the available training
examples.
• Define a cost function 𝐽 𝜃
1 𝑚 (𝑖) (𝑖) 2
J(θ) = σ𝑖=1(ℎ 𝑥 − 𝑦 )
2
• Find 𝜃 that minimizes J(𝜃).
LMS Algorithm
• Start a search algorithm (e.g. gradient descent algorithm,) with initial guess of 𝜃.
• Repeatedly update 𝜃 to make J(𝜃) smaller, until it converges to minima.
𝜕
βj = βj − 𝛼 𝐽 𝜃
𝜕βj
• J is a convex quadratic function, so has a single global minima. gradient descent
eventually converges at the global minima.
• At each iteration this algorithm takes a step in the direction of steepest descent(-
ve direction of gradient).
LMS Update Rule
• If you have only one training example (𝑥, 𝑦)
𝜕 𝜕 1
J(𝜃) = (ℎ 𝑥 − 𝑦)2
𝜕𝜃 𝜕𝜃𝑗 2
1 𝜕
= 2. (ℎ 𝑥 − 𝑦) (ℎ 𝑥 − 𝑦)
2 𝜕𝜃𝑗
𝑛
𝜕
= ℎ 𝑥 −𝑦 . (෍ 𝜃𝑖 𝑥𝑖 − 𝑦)
𝜕𝜃𝑗
𝑖=0
= (ℎ 𝑥 − 𝑦)𝑥𝑗
• For a single training example, this gives the update
rule:
𝛽𝑗 = 𝛽𝑗 + 𝛼(𝑦 𝑖 − ℎ 𝑥 𝑖 )𝑥𝑗 (𝑖)
m training examples
Repeat until convergence {
𝜃𝑗 ≔ 𝜃𝑗 + 𝛼 σ𝑚
𝑖=1(𝑦
𝑖 −ℎ 𝑥 𝑖 ) 𝑥𝑗 (𝑖)
}

Batch Gradient Descent: looks at every example on

each step.
Stochastic gradient descent
• Repeatedly run through the training set.
• Whenever a training point is encountered, update the parameters
according to the gradient of the error with respect to that training example
only.

Repeat {
for I = 1 to m do
𝜃𝑗 ≔ 𝜃𝑗 + 𝛼(𝑦 𝑖 −ℎ 𝑥 𝑖 )𝑥𝑗 (𝑖) (for every j)
end for
} until convergence
Assumption about linear Regression model
1.There should be a linear and additive relationship between dependent (response) variable
and independent (predictor) variable(s). A linear relationship suggests that a change in
response Y due to one unit change in X¹ is constant, regardless of the value of X¹. An
additive relationship suggests that the effect of X¹ on Y is independent of other variables.
2.There should be no correlation between the residual (error) terms. Absence of
this phenomenon is known as Autocorrelation.
3.The independent variables should not be correlated. Absence of this phenomenon is
known as multicollinearity.
4.The error terms must have constant variance. This phenomenon is known as
homoskedasticity. The presence of non-constant variance is referred to
heteroskedasticity.
5. The error terms must be normally distributed.

Linear Regression
No ratings yet
Linear Regression
36 pages
Week 2 Watermark
No ratings yet
Week 2 Watermark
84 pages
2a Linear Regression 18may
No ratings yet
2a Linear Regression 18may
28 pages
Linear Regression 18may
No ratings yet
Linear Regression 18may
28 pages
Lecture-3---Linear-Regression-imran-20022025-092939am
No ratings yet
Lecture-3---Linear-Regression-imran-20022025-092939am
46 pages
Lecture 4 Linear Regression
100% (1)
Lecture 4 Linear Regression
44 pages
ML Module 2
No ratings yet
ML Module 2
185 pages
Linear Regression
No ratings yet
Linear Regression
62 pages
Lecture 11 Regression
No ratings yet
Lecture 11 Regression
53 pages
Module III (Part II)(Regression and Time Series)
No ratings yet
Module III (Part II)(Regression and Time Series)
118 pages
Lesson 8_ Regression-T
No ratings yet
Lesson 8_ Regression-T
54 pages
Lecture3_upload
No ratings yet
Lecture3_upload
28 pages
5 - Feature Generation
No ratings yet
5 - Feature Generation
15 pages
Unit 5
No ratings yet
Unit 5
104 pages
Chapter 17
No ratings yet
Chapter 17
31 pages
8-1 To 8-3 Simple - Lin - Regress - Inference
No ratings yet
8-1 To 8-3 Simple - Lin - Regress - Inference
49 pages
AI_Lec23
No ratings yet
AI_Lec23
36 pages
Homework 2
No ratings yet
Homework 2
4 pages
Logistic Regression
No ratings yet
Logistic Regression
74 pages
3CP10 Final MJJ Linear Regression
No ratings yet
3CP10 Final MJJ Linear Regression
68 pages
Topic 8 - Regression Analysis
No ratings yet
Topic 8 - Regression Analysis
51 pages
CUHK STAT5102 Ch3
No ratings yet
CUHK STAT5102 Ch3
73 pages
DEEP_LEARNING REC
No ratings yet
DEEP_LEARNING REC
40 pages
Math 225 Linear Algebra II Lecture Notes: John C. Bowman University of Alberta Edmonton, Canada
No ratings yet
Math 225 Linear Algebra II Lecture Notes: John C. Bowman University of Alberta Edmonton, Canada
61 pages
Week 04
No ratings yet
Week 04
101 pages
A Journey From Linear Algebra To Machine Learning
No ratings yet
A Journey From Linear Algebra To Machine Learning
50 pages
Regression Analysis
No ratings yet
Regression Analysis
47 pages
Datamining Lecture6
No ratings yet
Datamining Lecture6
41 pages
An Idiot Guide To SVM
No ratings yet
An Idiot Guide To SVM
25 pages
Exponential Functions Grade 10
No ratings yet
Exponential Functions Grade 10
8 pages
MATH 01 - Midterm Reviewer (Complete)
No ratings yet
MATH 01 - Midterm Reviewer (Complete)
6 pages
Regression Analysis
100% (1)
Regression Analysis
43 pages
Annotated 4 Ch4 Linear Regression F2014
No ratings yet
Annotated 4 Ch4 Linear Regression F2014
11 pages
Topic 2 Simple Regression Model
No ratings yet
Topic 2 Simple Regression Model
39 pages
Tutorial: Gaussian Process Models For Machine Learning
No ratings yet
Tutorial: Gaussian Process Models For Machine Learning
35 pages
31 Least Squares
No ratings yet
31 Least Squares
39 pages
Linear Regression
No ratings yet
Linear Regression
26 pages
Lecture 14 PDF
No ratings yet
Lecture 14 PDF
32 pages
LA-ML
No ratings yet
LA-ML
3 pages
Unit 6 Simple Regression and Corr
No ratings yet
Unit 6 Simple Regression and Corr
39 pages
Basic Raster Graphics Algorithms For 2D Drawing: Materi 02 - Komputer Grafik
No ratings yet
Basic Raster Graphics Algorithms For 2D Drawing: Materi 02 - Komputer Grafik
36 pages
Cartesian Product, Relations, Graphs
No ratings yet
Cartesian Product, Relations, Graphs
27 pages
Linear Regression (Simple & Multiple)
No ratings yet
Linear Regression (Simple & Multiple)
29 pages
Examen Metodos Numericos
No ratings yet
Examen Metodos Numericos
17 pages
Simple Regression and Correlation
No ratings yet
Simple Regression and Correlation
30 pages
Lec 3 Regression.
No ratings yet
Lec 3 Regression.
20 pages
C2 English
No ratings yet
C2 English
34 pages
Regression PDF
No ratings yet
Regression PDF
37 pages
Parallel Jacobi Algorithm11
No ratings yet
Parallel Jacobi Algorithm11
25 pages
CS480 6 Linear Models
No ratings yet
CS480 6 Linear Models
68 pages
CS464 Ch9 LinearRegression
100% (1)
CS464 Ch9 LinearRegression
43 pages
Lec2 Linear Regression With One Variable
No ratings yet
Lec2 Linear Regression With One Variable
48 pages
MachineLearning PDF
No ratings yet
MachineLearning PDF
94 pages
Linier Programming (I)
No ratings yet
Linier Programming (I)
33 pages
session1
No ratings yet
session1
39 pages
Ch10 - Curve Fitting
No ratings yet
Ch10 - Curve Fitting
157 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Numerical Analysis II Essentials
From Everand
Numerical Analysis II Essentials
The Editors of REA
No ratings yet
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Facets of Data
No ratings yet
Facets of Data
22 pages
Bias and Variance
No ratings yet
Bias and Variance
6 pages
Support Vector Machine
No ratings yet
Support Vector Machine
12 pages
Data Reduction Techniques
No ratings yet
Data Reduction Techniques
41 pages
Role of Macroeconomic Variables On Firms Performance Evidence From The UK
No ratings yet
Role of Macroeconomic Variables On Firms Performance Evidence From The UK
19 pages
Data Analysis Powerpoint 11
No ratings yet
Data Analysis Powerpoint 11
19 pages
Determinants of Induced Abortion Among Women of Reproductive Age
No ratings yet
Determinants of Induced Abortion Among Women of Reproductive Age
10 pages
Cannabis, Schizophrenia Genetic Risk, and Psychotic Experiences: A Cross-Sectional Study of 109,308 Participants From The UK Biobank
No ratings yet
Cannabis, Schizophrenia Genetic Risk, and Psychotic Experiences: A Cross-Sectional Study of 109,308 Participants From The UK Biobank
9 pages
ML Assignment (22BCE8086) 2
No ratings yet
ML Assignment (22BCE8086) 2
19 pages
Linear Regression Model: Man - PN@VNP - Edu.vn
No ratings yet
Linear Regression Model: Man - PN@VNP - Edu.vn
77 pages
Stepwise Regression
No ratings yet
Stepwise Regression
3 pages
Nutritional Status of Kurmi Adolescent Girls of Raipur
No ratings yet
Nutritional Status of Kurmi Adolescent Girls of Raipur
6 pages
Identifying Research Gaps From Literature Review & Development
No ratings yet
Identifying Research Gaps From Literature Review & Development
44 pages
Safety Effects of Wider Edge Lines On Rural, Two-Lane Highways - Eun SUg Park, Paul J. Carlson, Richard J. Porter, Carl K. Andersen.
No ratings yet
Safety Effects of Wider Edge Lines On Rural, Two-Lane Highways - Eun SUg Park, Paul J. Carlson, Richard J. Porter, Carl K. Andersen.
9 pages
Naan Muthalvan Project Report Stock Market Forecast 4310
No ratings yet
Naan Muthalvan Project Report Stock Market Forecast 4310
29 pages
Lecture 7
No ratings yet
Lecture 7
20 pages
Chapter 21: Multidimensional Scaling and Conjoint Analysis: Advance Marketing Research
No ratings yet
Chapter 21: Multidimensional Scaling and Conjoint Analysis: Advance Marketing Research
58 pages
RSH - Qam11 - ch04 GE Regression Models
No ratings yet
RSH - Qam11 - ch04 GE Regression Models
71 pages
RN10 BEEA StatPro RN Correlation and Regression Analyses MP RM FD
No ratings yet
RN10 BEEA StatPro RN Correlation and Regression Analyses MP RM FD
33 pages
Chapter 2
No ratings yet
Chapter 2
6 pages
Practical Research 2 Lesson 1
No ratings yet
Practical Research 2 Lesson 1
95 pages
Self Check Exercises: Exercise 13.2
No ratings yet
Self Check Exercises: Exercise 13.2
19 pages
Influence of Cooperative Credit On Cassava Production in Anambra State, Nigeria
No ratings yet
Influence of Cooperative Credit On Cassava Production in Anambra State, Nigeria
11 pages
Political Psychology
No ratings yet
Political Psychology
30 pages
Assessing The Factors of Customers Satisfaction On Credit Card Users in Bangladesh
No ratings yet
Assessing The Factors of Customers Satisfaction On Credit Card Users in Bangladesh
13 pages
Time Series Analysis and Forecasting
No ratings yet
Time Series Analysis and Forecasting
9 pages
Blal2018 - Airbnb's Effect On Hotel Sales Growth PDF
No ratings yet
Blal2018 - Airbnb's Effect On Hotel Sales Growth PDF
8 pages
Comparison of Segmentation Approaches: by Beth Horn and Wei Huang
No ratings yet
Comparison of Segmentation Approaches: by Beth Horn and Wei Huang
12 pages
Module 7-ACTIVITY 1
No ratings yet
Module 7-ACTIVITY 1
1 page
Research Pioneers
No ratings yet
Research Pioneers
89 pages
Brenes Perez Siles Cuadernos 2021 Predictores Fake News CR English
No ratings yet
Brenes Perez Siles Cuadernos 2021 Predictores Fake News CR English
23 pages
Issue 5
No ratings yet
Issue 5
25 pages
AD3491 - Unit 4 - Analysis of Variance Important Questions 2 Marks With Answer --3-9 (1)
No ratings yet
AD3491 - Unit 4 - Analysis of Variance Important Questions 2 Marks With Answer --3-9 (1)
7 pages
Econometrics Main Slides
No ratings yet
Econometrics Main Slides
175 pages