0% found this document useful (0 votes)

70 views

03 Linear Regression

This document discusses linear regression. It begins by defining linear regression and comparing it to classification. It then discusses using different basis functions for linear regression models, including polynomial and Gaussian bases. It covers topics like solving linear regression problems, regularized regression, and Bayesian linear regression. It also discusses maximum likelihood estimation and how linear least squares regression maximizes the likelihood for a Gaussian noise model.

Uploaded by

Shashank Yadav

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

70 views

03 Linear Regression

Uploaded by

Shashank Yadav

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 54

LINEAR REGRESSION

J. Elder

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Credits
Probability & Bayesian Inference

Some of these slides were sourced and/or modified

from:
Christopher

Bishop, Microsoft UK

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Linear Regression Topics

Probability & Bayesian Inference

What is linear regression?

Example: polynomial curve fitting
Other basis families
Solving linear regression problems
Regularized regression
Multiple linear regression
Bayesian linear regression

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

What is Linear Regression?

Probability & Bayesian Inference

In classification, we seek to identify the categorical class Ck

associate with a given input vector x.
In regression, we seek to identify (or estimate) a continuous
variable y associated with a given input vector x.
y is called the dependent variable.
x is called the independent variable.
If y is a vector, we call this multiple regression.
We will focus on the case where y is a scalar.
Notation:
y will denote the continuous model of the dependent variable
t will denote discrete noisy observations of the dependent
variable (sometimes called the target variable).

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Where is the Linear in Linear Regression?

Probability & Bayesian Inference

In regression we assume that y is a function of x.

The exact nature of this function is governed by an
unknown parameter vector w:
y = y x, w
The regression is linear if y is linear in w. In other
words, we can express y as

( )
()

y = wt! x
where

()

! x is some (potentially nonlinear) function of x.

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Linear Basis Function Models

Probability & Bayesian Inference

Generally

where j(x) are known as basis functions.

Typically, 0(x) = 1, so that w0 acts as a bias.
In the simplest case, we use linear basis functions :
d(x) = xd.

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Linear Regression Topics

Probability & Bayesian Inference

What is linear regression?

Example: polynomial curve fitting
Other basis families
Solving linear regression problems
Regularized regression
Multiple linear regression
Bayesian linear regression

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Example: Polynomial Bases

Probability & Bayesian Inference

Polynomial basis
functions:

These are global

small change in x
affects all basis functions.
A small change in a
basis function affects y
for all x.
CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Example: Polynomial Curve Fitting

Probability & Bayesian Inference

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Sum-of-Squares Error Function

Probability & Bayesian Inference

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

1st Order Polynomial

Probability & Bayesian Inference

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

3rd Order Polynomial

Probability & Bayesian Inference

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

9th Order Polynomial

Probability & Bayesian Inference

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Regularization
Probability & Bayesian Inference

Penalize large coefficient values

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Regularization
Probability & Bayesian Inference

9thOrderPolynomial

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Regularization
Probability & Bayesian Inference

9thOrderPolynomial

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Regularization
Probability & Bayesian Inference

9thOrderPolynomial

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Probabilistic View of Curve Fitting

Probability & Bayesian Inference

Why least squares?

Model noise (deviation of data from model) as
Gaussian i.i.d.

where ! !

1
is the precision of the noise.
2
"

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Maximum Likelihood
Probability & Bayesian Inference

We determine wML by minimizing the squared error E(w).

Thus least-squares regression reflects an assumption that the

noise is i.i.d. Gaussian.

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Maximum Likelihood
Probability & Bayesian Inference

We determine wML by minimizing the squared error E(w).

Now given wML, we can estimate the variance of the noise:

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Predictive Distribution
21

Probability & Bayesian Inference

Generating function
Observed data
Maximum likelihood prediction
Posterior over t

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

MAP: A Step towards Bayes

Probability & Bayesian Inference

Prior knowledge about probable values of w can be incorporated into the

regression:

Now the posterior over w is proportional to the product of the likelihood

times the prior:

The result is to introduce a new quadratic term in w into the error function
to be minimized:

Thus regularized (ridge) regression reflects a 0-mean isotropic Gaussian

prior on the weights.
CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Linear Regression Topics

Probability & Bayesian Inference

What is linear regression?

Example: polynomial curve fitting
Other basis families
Solving linear regression problems
Regularized regression
Multiple linear regression
Bayesian linear regression

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Gaussian Bases
Probability & Bayesian Inference

Gaussian basis functions:

Think of these as interpolation functions.

These are local:

small change in x affects

only nearby basis functions.
a small change in a basis
function affects y only for
nearby x.
j and s control location
and scale (width).
CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Linear Regression Topics

Probability & Bayesian Inference

What is linear regression?

Example: polynomial curve fitting
Other basis families
Solving linear regression problems
Regularized regression
Multiple linear regression
Bayesian linear regression

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Maximum Likelihood and Linear Least Squares

Probability & Bayesian Inference

Assume observations from a deterministic function with

added Gaussian noise:
where

which is the same as saying,

Given observed inputs,
, and
targets,
we obtain the likelihood
function

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Maximum Likelihood and Linear Least Squares

Probability & Bayesian Inference

Taking the logarithm, we get

where

is the sum-of-squares error.

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Maximum Likelihood and Least Squares

Probability & Bayesian Inference

Computing the gradient and setting it to zero yields

Solving for w, we get

where

The Moore-Penrose
pseudo-inverse,
.

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

End of Lecture 8

Linear Regression Topics

Probability & Bayesian Inference

What is linear regression?

Example: polynomial curve fitting
Other basis families
Solving linear regression problems
Regularized regression
Multiple linear regression
Bayesian linear regression

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Regularized Least Squares

Probability & Bayesian Inference

Consider the error function:

Data term + Regularization term

With the sum-of-squares error function and a

quadratic regularizer, we get

which is minimized by

is called the
regularization
coefficient.

Thus the name ridge regression

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Regularized Least Squares

Probability & Bayesian Inference

With a more general regularizer, we have

Lasso

Quadratic

(Least absolute shrinkage and selection operator)

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Regularized Least Squares

Probability & Bayesian Inference

Lasso generates sparse solutions.

Iso-contours
of data term ED(w)

Iso-contour of
regularization term EW(w)

Quadratic

Lasso

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Solving Regularized Systems

Probability & Bayesian Inference

Quadratic regularization has the advantage that

the solution is closed form.
Non-quadratic regularizers generally do not have
closed form solutions
Lasso can be framed as minimizing a quadratic
error with linear constraints, and thus represents a
convex optimization problem that can be solved by
quadratic programming or other convex
optimization methods.
We will discuss quadratic programming when we
cover SVMs

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Linear Regression Topics

Probability & Bayesian Inference

What is linear regression?

Example: polynomial curve fitting
Other basis families
Solving linear regression problems
Regularized regression
Multiple linear regression
Bayesian linear regression

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Multiple Outputs
Probability & Bayesian Inference

Analogous to the single output case we have:

Given observed inputs

targets
we obtain the log likelihood function

, and

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Multiple Outputs
Probability & Bayesian Inference

Maximizing with respect to W, we obtain

If we consider a single target variable, tk, we see that

where
single output case.

, which is identical with the

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Some Useful MATLAB Functions

Probability & Bayesian Inference

polyfit
Least-squares

fit of a polynomial of specified order to

given data

regress
More

general function that computes linear weights for

least-squares fit

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Linear Regression Topics

Probability & Bayesian Inference

What is linear regression?

Example: polynomial curve fitting
Other basis families
Solving linear regression problems
Regularized regression
Multiple linear regression
Bayesian linear regression

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Bayesian Linear Regression

Rev. Thomas Bayes, 1702 - 1761

Bayesian Linear Regression

Probability & Bayesian Inference

Define a conjugate prior over w:

Combining this with the likelihood function and using

results for marginal and conditional Gaussian
distributions, gives the posterior

where

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Bayesian Linear Regression

Probability & Bayesian Inference

A common choice for the prior is

for which

Thus mN represents the ridge regression solution with

! =" /#

Next we consider an example

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Bayesian Linear Regression

Probability & Bayesian Inference

0 data points observed

Prior

Data Space

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Bayesian Linear Regression

Probability & Bayesian Inference

1 data point observed

Likelihood for (x1,t1)

Posterior

Data Space

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Bayesian Linear Regression

Probability & Bayesian Inference

2 data points observed

Likelihood for (x2,t2)

Posterior

Data Space

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Bayesian Linear Regression

Probability & Bayesian Inference

20 data points observed

Likelihood for (x20,t20)

Posterior

Data Space

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Predictive Distribution
Probability & Bayesian Inference

Predict t for new values of x by integrating over w:

where

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Predictive Distribution
Probability & Bayesian Inference

Example: Sinusoidal data, 9 Gaussian basis functions,

1 data point

Notice how much bigger our uncertainty is

relative to the ML method!!

p t | t,! , "

Samples of y(x,w)

E #$t | t,! , " %&

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Predictive Distribution
Probability & Bayesian Inference

Example: Sinusoidal data, 9 Gaussian basis functions,

2 data points
E #$t | t,! , " %&

p t | t,! , "

Samples of y(x,w)

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Predictive Distribution
Probability & Bayesian Inference

Example: Sinusoidal data, 9 Gaussian basis functions,

4 data points
E #$t | t,! , " %&

p t | t,! , "

Samples of y(x,w)

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Predictive Distribution
Probability & Bayesian Inference

Example: Sinusoidal data, 9 Gaussian basis functions,

25 data points
E #$t | t,! , " %&

p t | t,! , "

Samples of y(x,w)

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Equivalent Kernel
Probability & Bayesian Inference

The predictive mean can be written

Equivalent kernel or
smoother matrix.

This is a weighted sum of the training data target

values, tn.

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Equivalent Kernel
53

Probability & Bayesian Inference

Weight of tn depends on distance between x and xn;

nearby xn carry more weight.
CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Linear Regression Topics

Probability & Bayesian Inference

What is linear regression?

Example: polynomial curve fitting
Other basis families
Solving linear regression problems
Regularized regression
Multiple linear regression
Bayesian linear regression

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Manifold Learning Theory and Applications 9781439871102 Compress
No ratings yet
Manifold Learning Theory and Applications 9781439871102 Compress
322 pages
FSUIPC7 Offsets Status
100% (2)
FSUIPC7 Offsets Status
120 pages
Coursera BioinfoMethods-I Lecture01
No ratings yet
Coursera BioinfoMethods-I Lecture01
15 pages
Pattern Recognition Machine Learning: Chapter 3: Linear Models For Regression
100% (1)
Pattern Recognition Machine Learning: Chapter 3: Linear Models For Regression
48 pages
ML Lecture Linear Regression 1
No ratings yet
ML Lecture Linear Regression 1
33 pages
Linear - Regression
100% (1)
Linear - Regression
39 pages
Introduction To Machine Learning Lecture 2: Linear Regression
No ratings yet
Introduction To Machine Learning Lecture 2: Linear Regression
38 pages
Lecture slides - Linear Regression (2025)
No ratings yet
Lecture slides - Linear Regression (2025)
45 pages
Linear Regression
No ratings yet
Linear Regression
20 pages
Chapter-3-Linear Models For Regression
100% (1)
Chapter-3-Linear Models For Regression
61 pages
Lecture3 2015
No ratings yet
Lecture3 2015
38 pages
Lecture 3_Regression (1)
No ratings yet
Lecture 3_Regression (1)
47 pages
Bayesian linear regression for Posterior Predictive Distribution MATLAB
No ratings yet
Bayesian linear regression for Posterior Predictive Distribution MATLAB
46 pages
PRML Slides 3
No ratings yet
PRML Slides 3
57 pages
EE5434 Regression
No ratings yet
EE5434 Regression
96 pages
ML-Lec8
No ratings yet
ML-Lec8
7 pages
Lecture 02
No ratings yet
Lecture 02
43 pages
Lecture 2
No ratings yet
Lecture 2
66 pages
Chapter Regression
No ratings yet
Chapter Regression
10 pages
Lecture 13 - Least Squares
No ratings yet
Lecture 13 - Least Squares
28 pages
M6 RegressionLinearModels v2
No ratings yet
M6 RegressionLinearModels v2
97 pages
Lec20 RidgeRegression
No ratings yet
Lec20 RidgeRegression
21 pages
Machine Learning Lecture 1
No ratings yet
Machine Learning Lecture 1
5 pages
Essentials of Linear Regression in Python
No ratings yet
Essentials of Linear Regression in Python
23 pages
Berkeley Machine Learning
No ratings yet
Berkeley Machine Learning
185 pages
slides_foundations
No ratings yet
slides_foundations
81 pages
QSRI-lecture1
No ratings yet
QSRI-lecture1
45 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
ML_Lec 4-introduction to regression
No ratings yet
ML_Lec 4-introduction to regression
65 pages
Linear Regression
No ratings yet
Linear Regression
61 pages
Linear Modal For Regresion
No ratings yet
Linear Modal For Regresion
32 pages
Bishop Solutions PDF
No ratings yet
Bishop Solutions PDF
87 pages
Wk05 machine learning
No ratings yet
Wk05 machine learning
6 pages
Linear Regression 18may
No ratings yet
Linear Regression 18may
28 pages
2a Linear Regression 18may
No ratings yet
2a Linear Regression 18may
28 pages
Lecture Notes 5 Linear Regression
No ratings yet
Lecture Notes 5 Linear Regression
11 pages
771 A18 Lec5
No ratings yet
771 A18 Lec5
156 pages
Regression
No ratings yet
Regression
39 pages
Chapter2 Annotated Part2
No ratings yet
Chapter2 Annotated Part2
30 pages
Machinelearning
No ratings yet
Machinelearning
59 pages
Intro To ML RevisionNotes
No ratings yet
Intro To ML RevisionNotes
24 pages
Lec4 Oct12 2022 PracticalNotes LinearRegression
No ratings yet
Lec4 Oct12 2022 PracticalNotes LinearRegression
34 pages
Lecture15 Regression
No ratings yet
Lecture15 Regression
15 pages
Unit 2 Machine Learning
No ratings yet
Unit 2 Machine Learning
32 pages
Machine Learning Guide
No ratings yet
Machine Learning Guide
185 pages
Machine Learning
100% (1)
Machine Learning
185 pages
Regression
No ratings yet
Regression
11 pages
Lecture 09_02.09.2024_Regression-01
No ratings yet
Lecture 09_02.09.2024_Regression-01
62 pages
Notes5_Regression
No ratings yet
Notes5_Regression
14 pages
ML Summary PDF
No ratings yet
ML Summary PDF
5 pages
Linear Regression Python Programming
No ratings yet
Linear Regression Python Programming
25 pages
Stanford ML
No ratings yet
Stanford ML
168 pages
Lecture 3 - Linear Regression
No ratings yet
Lecture 3 - Linear Regression
31 pages
DS303: Introduction To Machine Learning: Manjesh K. Hanawal
No ratings yet
DS303: Introduction To Machine Learning: Manjesh K. Hanawal
17 pages
Statistical Learning Theory
No ratings yet
Statistical Learning Theory
4 pages
Advanced Regression Pres
No ratings yet
Advanced Regression Pres
42 pages
1 Lecture 5b: Probabilistic Perspectives On ML Algorithms
No ratings yet
1 Lecture 5b: Probabilistic Perspectives On ML Algorithms
6 pages
ML 5
No ratings yet
ML 5
21 pages
Lec22 Introduction2BayesianRegression
No ratings yet
Lec22 Introduction2BayesianRegression
42 pages
Statistical Analysis Techniques in Particle Physics: Fits, Density Estimation and Supervised Learning
From Everand
Statistical Analysis Techniques in Particle Physics: Fits, Density Estimation and Supervised Learning
Ilya Narsky
No ratings yet
Classification, Parameter Estimation and State Estimation: An Engineering Approach Using MATLAB
From Everand
Classification, Parameter Estimation and State Estimation: An Engineering Approach Using MATLAB
Bangjun Lei
3/5 (1)
Optimization Techniques and Applications with Examples
From Everand
Optimization Techniques and Applications with Examples
Xin-She Yang
No ratings yet
Hough Transform: Unveiling the Magic of Hough Transform in Computer Vision
From Everand
Hough Transform: Unveiling the Magic of Hough Transform in Computer Vision
Fouad Sabry
No ratings yet
TAAN - Discovering Trekking Trails in Nepal PDF
No ratings yet
TAAN - Discovering Trekking Trails in Nepal PDF
144 pages
Deep Learning Biomedicine
No ratings yet
Deep Learning Biomedicine
28 pages
(Updated On 03 08 2016) : Time Table Indian Institute of Technology Delhi
No ratings yet
(Updated On 03 08 2016) : Time Table Indian Institute of Technology Delhi
2 pages
Collingwood Art
No ratings yet
Collingwood Art
24 pages
Nucl. Acids Res. 2004 Stanke W309 12
No ratings yet
Nucl. Acids Res. 2004 Stanke W309 12
4 pages
Assin2 Solution
No ratings yet
Assin2 Solution
6 pages
Baba Yaga
No ratings yet
Baba Yaga
2 pages
Accela Civic Platform
No ratings yet
Accela Civic Platform
12 pages
HEC 4 UsersManual (CPD 4)
No ratings yet
HEC 4 UsersManual (CPD 4)
104 pages
RUBIK's INTERMEDIATE METHOD PDF
No ratings yet
RUBIK's INTERMEDIATE METHOD PDF
7 pages
Third Eye For Blind People-1
No ratings yet
Third Eye For Blind People-1
4 pages
Wifi Technology PDF
No ratings yet
Wifi Technology PDF
17 pages
Unsupervised Learning of Video Representations Using Lstms
No ratings yet
Unsupervised Learning of Video Representations Using Lstms
12 pages
Aloha Manager Utilities 62
100% (1)
Aloha Manager Utilities 62
24 pages
Myhill Nerode Theorem: by Anusha Tilkam
No ratings yet
Myhill Nerode Theorem: by Anusha Tilkam
20 pages
Parts List FR590
No ratings yet
Parts List FR590
134 pages
TCS IT Wiz Book 2011
No ratings yet
TCS IT Wiz Book 2011
31 pages
(Ver) SMART Payout Manual Set - Section 5 - Aus
No ratings yet
(Ver) SMART Payout Manual Set - Section 5 - Aus
20 pages
Introducing Winquake 2.5: Background
No ratings yet
Introducing Winquake 2.5: Background
11 pages
Tech Mahindra MockTest #1 - Bits
No ratings yet
Tech Mahindra MockTest #1 - Bits
13 pages
Building Information Modeling (BIM) PDF
No ratings yet
Building Information Modeling (BIM) PDF
12 pages
A Overview Breif of PRIDIT
No ratings yet
A Overview Breif of PRIDIT
28 pages
State Bank of India Case F
No ratings yet
State Bank of India Case F
2 pages
7 Waste Seval Form
No ratings yet
7 Waste Seval Form
2 pages
Elx Ds All 554M NC553m HP
No ratings yet
Elx Ds All 554M NC553m HP
2 pages
Bou Catdd
No ratings yet
Bou Catdd
2 pages
Online Car Pooling System
No ratings yet
Online Car Pooling System
28 pages
Syllabus
No ratings yet
Syllabus
2 pages
Data Steaming Sylll
No ratings yet
Data Steaming Sylll
12 pages
Form 5 - Instructor's Feedback Form AJ
No ratings yet
Form 5 - Instructor's Feedback Form AJ
2 pages
B0700BG B
No ratings yet
B0700BG B
64 pages
BSNL
No ratings yet
BSNL
9 pages
Jason Provenzano Joins Makers Nutrition, LCC, As President and CEO
No ratings yet
Jason Provenzano Joins Makers Nutrition, LCC, As President and CEO
3 pages
JAVA-PROJECT-REPORT PDF Word
No ratings yet
JAVA-PROJECT-REPORT PDF Word
32 pages
Message Passing Paradigm
No ratings yet
Message Passing Paradigm
20 pages
FBR2 Posting With Reference Document
No ratings yet
FBR2 Posting With Reference Document
8 pages