Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
70 views

03 Linear Regression

This document discusses linear regression. It begins by defining linear regression and comparing it to classification. It then discusses using different basis functions for linear regression models, including polynomial and Gaussian bases. It covers topics like solving linear regression problems, regularized regression, and Bayesian linear regression. It also discusses maximum likelihood estimation and how linear least squares regression maximizes the likelihood for a Gaussian noise model.

Uploaded by

Shashank Yadav
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views

03 Linear Regression

This document discusses linear regression. It begins by defining linear regression and comparing it to classification. It then discusses using different basis functions for linear regression models, including polynomial and Gaussian bases. It covers topics like solving linear regression problems, regularized regression, and Bayesian linear regression. It also discusses maximum likelihood estimation and how linear least squares regression maximizes the likelihood for a Gaussian noise model.

Uploaded by

Shashank Yadav
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

LINEAR REGRESSION

J. Elder

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Credits
Probability & Bayesian Inference

Some of these slides were sourced and/or modified


from:
Christopher

Bishop, Microsoft UK

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Linear Regression Topics


Probability & Bayesian Inference

What is linear regression?


Example: polynomial curve fitting
Other basis families
Solving linear regression problems
Regularized regression
Multiple linear regression
Bayesian linear regression

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

What is Linear Regression?


Probability & Bayesian Inference

In classification, we seek to identify the categorical class Ck


associate with a given input vector x.
In regression, we seek to identify (or estimate) a continuous
variable y associated with a given input vector x.
y is called the dependent variable.
x is called the independent variable.
If y is a vector, we call this multiple regression.
We will focus on the case where y is a scalar.
Notation:
y will denote the continuous model of the dependent variable
t will denote discrete noisy observations of the dependent
variable (sometimes called the target variable).

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Where is the Linear in Linear Regression?


Probability & Bayesian Inference

In regression we assume that y is a function of x.


The exact nature of this function is governed by an
unknown parameter vector w:
y = y x, w
The regression is linear if y is linear in w. In other
words, we can express y as

( )
()

y = wt! x
where

()

! x is some (potentially nonlinear) function of x.


CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Linear Basis Function Models


Probability & Bayesian Inference

Generally

where j(x) are known as basis functions.


Typically, 0(x) = 1, so that w0 acts as a bias.
In the simplest case, we use linear basis functions :
d(x) = xd.

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Linear Regression Topics


Probability & Bayesian Inference

What is linear regression?


Example: polynomial curve fitting
Other basis families
Solving linear regression problems
Regularized regression
Multiple linear regression
Bayesian linear regression

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Example: Polynomial Bases


Probability & Bayesian Inference

Polynomial basis
functions:

These are global

small change in x
affects all basis functions.
A small change in a
basis function affects y
for all x.
CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Example: Polynomial Curve Fitting


9

Probability & Bayesian Inference

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Sum-of-Squares Error Function


10

Probability & Bayesian Inference

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

1st Order Polynomial


11

Probability & Bayesian Inference

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

3rd Order Polynomial


12

Probability & Bayesian Inference

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

9th Order Polynomial


13

Probability & Bayesian Inference

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Regularization
Probability & Bayesian Inference

14

Penalize large coefficient values

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Regularization
Probability & Bayesian Inference

15

9thOrderPolynomial

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Regularization
Probability & Bayesian Inference

16

9thOrderPolynomial

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Regularization
Probability & Bayesian Inference

17

9thOrderPolynomial

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Probabilistic View of Curve Fitting


Probability & Bayesian Inference

18

Why least squares?


Model noise (deviation of data from model) as
Gaussian i.i.d.

where ! !

1
is the precision of the noise.
2
"

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Maximum Likelihood
Probability & Bayesian Inference

19

We determine wML by minimizing the squared error E(w).

Thus least-squares regression reflects an assumption that the


noise is i.i.d. Gaussian.

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Maximum Likelihood
Probability & Bayesian Inference

20

We determine wML by minimizing the squared error E(w).

Now given wML, we can estimate the variance of the noise:

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Predictive Distribution
21

Probability & Bayesian Inference

Generating function
Observed data
Maximum likelihood prediction
Posterior over t

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

MAP: A Step towards Bayes


Probability & Bayesian Inference

22

Prior knowledge about probable values of w can be incorporated into the


regression:

Now the posterior over w is proportional to the product of the likelihood


times the prior:

The result is to introduce a new quadratic term in w into the error function
to be minimized:

Thus regularized (ridge) regression reflects a 0-mean isotropic Gaussian


prior on the weights.
CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Linear Regression Topics


Probability & Bayesian Inference

23

What is linear regression?


Example: polynomial curve fitting
Other basis families
Solving linear regression problems
Regularized regression
Multiple linear regression
Bayesian linear regression

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Gaussian Bases
Probability & Bayesian Inference

24

Gaussian basis functions:

Think of these as interpolation functions.

These are local:

small change in x affects


only nearby basis functions.
a small change in a basis
function affects y only for
nearby x.
j and s control location
and scale (width).
CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Linear Regression Topics


Probability & Bayesian Inference

25

What is linear regression?


Example: polynomial curve fitting
Other basis families
Solving linear regression problems
Regularized regression
Multiple linear regression
Bayesian linear regression

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Maximum Likelihood and Linear Least Squares


Probability & Bayesian Inference

26

Assume observations from a deterministic function with


added Gaussian noise:
where

which is the same as saying,


Given observed inputs,
, and
targets,
we obtain the likelihood
function

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Maximum Likelihood and Linear Least Squares


Probability & Bayesian Inference

27

Taking the logarithm, we get

where

is the sum-of-squares error.

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Maximum Likelihood and Least Squares


Probability & Bayesian Inference

28

Computing the gradient and setting it to zero yields

Solving for w, we get

where

The Moore-Penrose
pseudo-inverse,
.

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

End of Lecture 8

Linear Regression Topics


Probability & Bayesian Inference

30

What is linear regression?


Example: polynomial curve fitting
Other basis families
Solving linear regression problems
Regularized regression
Multiple linear regression
Bayesian linear regression

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Regularized Least Squares


Probability & Bayesian Inference

31

Consider the error function:


Data term + Regularization term

With the sum-of-squares error function and a


quadratic regularizer, we get

which is minimized by

is called the
regularization
coefficient.

Thus the name ridge regression


CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Regularized Least Squares


Probability & Bayesian Inference

32

With a more general regularizer, we have

Lasso

Quadratic

(Least absolute shrinkage and selection operator)


CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Regularized Least Squares


Probability & Bayesian Inference

33

Lasso generates sparse solutions.


Iso-contours
of data term ED(w)

Iso-contour of
regularization term EW(w)

Quadratic

Lasso

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Solving Regularized Systems


Probability & Bayesian Inference

34

Quadratic regularization has the advantage that


the solution is closed form.
Non-quadratic regularizers generally do not have
closed form solutions
Lasso can be framed as minimizing a quadratic
error with linear constraints, and thus represents a
convex optimization problem that can be solved by
quadratic programming or other convex
optimization methods.
We will discuss quadratic programming when we
cover SVMs

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Linear Regression Topics


Probability & Bayesian Inference

35

What is linear regression?


Example: polynomial curve fitting
Other basis families
Solving linear regression problems
Regularized regression
Multiple linear regression
Bayesian linear regression

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Multiple Outputs
Probability & Bayesian Inference

36

Analogous to the single output case we have:

Given observed inputs


targets
we obtain the log likelihood function

, and

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Multiple Outputs
Probability & Bayesian Inference

37

Maximizing with respect to W, we obtain

If we consider a single target variable, tk, we see that

where
single output case.

, which is identical with the

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Some Useful MATLAB Functions


Probability & Bayesian Inference

38

polyfit
Least-squares

fit of a polynomial of specified order to

given data

regress
More

general function that computes linear weights for


least-squares fit

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Linear Regression Topics


Probability & Bayesian Inference

39

What is linear regression?


Example: polynomial curve fitting
Other basis families
Solving linear regression problems
Regularized regression
Multiple linear regression
Bayesian linear regression

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Bayesian Linear Regression

Rev. Thomas Bayes, 1702 - 1761

Bayesian Linear Regression


Probability & Bayesian Inference

41

Define a conjugate prior over w:

Combining this with the likelihood function and using


results for marginal and conditional Gaussian
distributions, gives the posterior

where

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Bayesian Linear Regression


Probability & Bayesian Inference

42

A common choice for the prior is

for which

Thus mN represents the ridge regression solution with

! =" /#

Next we consider an example

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Bayesian Linear Regression


43

Probability & Bayesian Inference

0 data points observed


Prior

Data Space

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Bayesian Linear Regression


Probability & Bayesian Inference

44

1 data point observed


Likelihood for (x1,t1)

Posterior

Data Space

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Bayesian Linear Regression


Probability & Bayesian Inference

45

2 data points observed


Likelihood for (x2,t2)

Posterior

Data Space

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Bayesian Linear Regression


Probability & Bayesian Inference

46

20 data points observed


Likelihood for (x20,t20)

Posterior

Data Space

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Predictive Distribution
Probability & Bayesian Inference

47

Predict t for new values of x by integrating over w:

where

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Predictive Distribution
Probability & Bayesian Inference

48

Example: Sinusoidal data, 9 Gaussian basis functions,


1 data point

Notice how much bigger our uncertainty is


relative to the ML method!!

p t | t,! , "

Samples of y(x,w)

E #$t | t,! , " %&

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Predictive Distribution
Probability & Bayesian Inference

49

Example: Sinusoidal data, 9 Gaussian basis functions,


2 data points
E #$t | t,! , " %&

p t | t,! , "

Samples of y(x,w)

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Predictive Distribution
Probability & Bayesian Inference

50

Example: Sinusoidal data, 9 Gaussian basis functions,


4 data points
E #$t | t,! , " %&

p t | t,! , "

Samples of y(x,w)

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Predictive Distribution
Probability & Bayesian Inference

51

Example: Sinusoidal data, 9 Gaussian basis functions,


25 data points
E #$t | t,! , " %&

p t | t,! , "

Samples of y(x,w)

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Equivalent Kernel
Probability & Bayesian Inference

52

The predictive mean can be written

Equivalent kernel or
smoother matrix.

This is a weighted sum of the training data target


values, tn.

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Equivalent Kernel
53

Probability & Bayesian Inference

Weight of tn depends on distance between x and xn;


nearby xn carry more weight.
CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

Linear Regression Topics


Probability & Bayesian Inference

54

What is linear regression?


Example: polynomial curve fitting
Other basis families
Solving linear regression problems
Regularized regression
Multiple linear regression
Bayesian linear regression

CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

J. Elder

You might also like