100% found this document useful (1 vote)

89 views

Chapter-3-Linear Models For Regression

1. This document discusses linear models for regression, including linear basis function models using polynomials and Gaussians as basis functions. 2. It also covers maximum likelihood and least squares estimation for linear regression, as well as regularized least squares and its use of Lasso regularization. 3. Bayesian linear regression is introduced, including derivation of the posterior distribution and examples of how the posterior changes as more data is observed.

Uploaded by

longfei zhang

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

89 views

Chapter-3-Linear Models For Regression

Uploaded by

longfei zhang

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 61

PATTERN RECOGNITION

AND MACHINE LEARNING

CHAPTER 3: LINEAR MODELS FOR REGRESSION
Linear Basis Function Models (1)

Example: Polynomial Curve Fitting

确定映射关系
Sum-of-Squares Error Function

确定目标函数

确定 w 使目标函数最小化
0th Order Polynomial
1st Order Polynomial
3rd Order Polynomial
9th Order Polynomial

避免过拟合： 1 、增加训练集 2 、增加惩罚项

Linear Basis Function Models (2)
Generally

where Áj(x) are known as basis functions.

Typically, Á0(x) = 1, so that w0 acts as a bias.
In the simplest case, we use linear basis
functions : Ád(x) = xd.
Linear Basis Function Models (3)
Polynomial basis functions:

These are global; a small

change in x affect all basis
functions.
Linear Basis Function Models (4)
Gaussian basis functions:

These are local; a small change

in x only affect nearby basis
functions. ¹j and s control
location and scale (width).
Linear Basis Function Models (5)
Sigmoidal basis functions:

where

Also these are local; a small

change in x only affect nearby
basis functions. ¹j and s control
location and scale (slope).
Maximum Likelihood and Least Squares (1)

Assume observations from a deterministic function

with added Gaussian noise:
方差
where

which is the same as saying,

Given observed inputs, , and targets,

, we obtain the likelihood function

最大化似然函数
Maximum Likelihood and Least Squares (2)
在高斯噪声下，才能写成该形
Taking the logarithm, we get 式，此时最小均方误差和最大
似然估计等价

where

is the sum-of-squares error.

Maximum Likelihood and Least Squares (3)

Computing the gradient and setting it to zero yields

Solving for w, we get The Moore-Penrose

pseudo-inverse, .

where

多个（ x,y ）对应

Geometry of Least Squares
Consider Φ 相当于空间中的基，称其为基函数

N-dimensional
M-dimensional

S is spanned by .
wML minimizes the distance
between t and its orthogonal
projection on S, i.e. y.

Y 的每一个值为 M 个向量的线型组合
Sequential Learning
Data items considered one at a time (a.k.a.
online learning); use stochastic (sequential)
gradient descent:

This is known as the least-mean-squares (LMS)

algorithm. Issue: how to choose ´?
1st Order Polynomial
3rd Order Polynomial
9th Order Polynomial
Over-fitting

Root-Mean-Square (RMS) Error:

Polynomial Coefficients
Regularized Least Squares (1)
Consider the error function:
添加惩罚项

Data term + Regularization term

With the sum-of-squares error function and a

quadratic regularizer, we get

¸ is called the
which is minimized by regularization
coefficient.
Regularized Least Squares (1)

Is it true that or
Regularized Least Squares (2)
With a more general regularizer, we have

Lasso Quadratic
Regularized Least Squares (3)
Lasso tends to generate sparser solutions than a
quadratic
regularizer.
Understanding Lasso Regularizer

• Lasso regularizer plays the role of

thresholding
Multiple Outputs (1)
Analogously to the single output case we have:

Given observed inputs, , and targets,

, we obtain the log likelihood function
Multiple Outputs (2)
Maximizing with respect to W, we obtain

If we consider a single target variable, tk, we see that

where , which is identical with the

single output case.
The Bias-Variance Decomposition (1)
Recall the expected squared loss,

where

The second term of E[L] corresponds to the noise

inherent in the random variable t.
What about the first term?
The Bias-Variance Decomposition (2)
Suppose we were given multiple data sets, each of
size N. Any particular data set, D, will give a
particular function y(x;D). We then have
The Bias-Variance Decomposition (3)
Taking the expectation over D yields

偏差描述与真实值之间的差距
方差描述与观测值均值之间的差距
The Bias-Variance Decomposition (4)
Thus we can write

where
Bias-Variance Tradeoff
The Bias-Variance Decomposition (5)
Example: 25 data sets from the sinusoidal, varying
the degree of regularization, ¸.
The Bias-Variance Decomposition (6)
Example: 25 data sets from the sinusoidal, varying
the degree of regularization, ¸.
The Bias-Variance Decomposition (7)
Example: 25 data sets from the sinusoidal, varying
the degree of regularization, ¸.
The Bias-Variance Trade-off
From these plots, we note
that an over-regularized
model (large ¸) will have a
high bias, while an under-
regularized model (small ¸)
will have a high variance.
Bayesian Linear Regression (1)
Define a conjugate prior over w

Combining this with the likelihood function and using

results for marginal and conditional Gaussian
distributions, gives the posterior

where
Bayesian Linear Regression (2)
A common choice for the prior is

for which

Next we consider an example …

Bayesian Linear Regression (3)
0 data points observed
Prior Data Space
Bayesian Linear Regression (4)
1 data point observed
Likelihood Posterior Data Space
Bayesian Linear Regression (5)
2 data points observed
Likelihood Posterior Data Space
Bayesian Linear Regression (6)
20 data points observed
Likelihood Posterior Data Space
Predictive Distribution (1)
Predict t for new values of x by integrating
over w:

where
Predictive Distribution (1)
How to compute the predictive distribution ?
Predictive Distribution (2)
Example: Sinusoidal data, 9 Gaussian basis functions,
1 data point
Predictive Distribution (3)
Example: Sinusoidal data, 9 Gaussian basis functions,
2 data points
Predictive Distribution (4)
Example: Sinusoidal data, 9 Gaussian basis functions,
4 data points
Predictive Distribution (5)
Example: Sinusoidal data, 9 Gaussian basis functions,
25 data points
Salesman’s Problem
Given
• a consumer a described by a vector x
• a product b to sell with base cost c
• estimated price distribution of b in the mind
of a is Pr(t|x, w) = N(xTw, 2)

What is price should we offer to a ?

Salesman’s Problem
Bayesian Model Comparison (1)
How do we choose the ‘right’ model?
Bayesian Model Comparison (1)
How do we choose the ‘right’ model?

Assume we want to compare models Mi, i=1, …,L,

using data D; this requires computing

Posterior Prior Model evidence or

marginal likelihood
Bayesian Model Comparison (1)
How do we choose the ‘right’ model?

Posterior Prior Model evidence or

marginal likelihood
Bayesian Model Comparison (3)
For a model with parameters w, we get the
model evidence by marginalizing over w
Bayesian Model Comparison (4)
For a given model with a
single parameter, w, con-
sider the approximation

where the posterior is

assumed to be sharply
peaked.
Bayesian Model Comparison (4)
For a given model with a
single parameter, w, con-
sider the approximation

where the posterior is

assumed to be sharply
peaked.
Bayesian Model Comparison (5)
Taking logarithms, we obtain

Negative

With M parameters, all assumed to have the same

ratio , we get

Negative and linear in M.

Bayesian Model Comparison (5)

Data matching Model Complexity

Bayesian Model Comparison (6)
Matching data and model complexity
What You Should Know
• Least square and its maximum likelihood
interpretation
• Sequential learning for regression
• Regularization and its effect
• Bayesian regression
• Predictive distribution
• Bias variance tradeoff
• Bayesian model selection

Applied Data Science Camp - Info
100% (1)
Applied Data Science Camp - Info
12 pages
Study Plan - SBL 12 Week - PER
100% (1)
Study Plan - SBL 12 Week - PER
1 page
Introduction To Machine Learning: Jaime S. Cardoso
100% (1)
Introduction To Machine Learning: Jaime S. Cardoso
52 pages
CS550 Regression Aug12
100% (1)
CS550 Regression Aug12
63 pages
CS464 Ch9 LinearRegression
100% (1)
CS464 Ch9 LinearRegression
43 pages
Linear - Regression
100% (1)
Linear - Regression
39 pages
9 Regression
100% (1)
9 Regression
14 pages
A) What Is Motivation Behind Ensemble Methods? Give Your Answer in Probabilistic Terms
100% (1)
A) What Is Motivation Behind Ensemble Methods? Give Your Answer in Probabilistic Terms
6 pages
Lab 3. Linear Regression 230223
100% (1)
Lab 3. Linear Regression 230223
7 pages
Bagging, Boosting
100% (1)
Bagging, Boosting
32 pages
Charmi Shah 20bcp299 Lab2
100% (1)
Charmi Shah 20bcp299 Lab2
7 pages
Assignment10 4
100% (1)
Assignment10 4
3 pages
Gradient Descent - Linear Regression
100% (1)
Gradient Descent - Linear Regression
47 pages
SQL Cheat Sheet
100% (1)
SQL Cheat Sheet
44 pages
0.1 Guilherme Marthe - Boston House Pricing Challenge
100% (1)
0.1 Guilherme Marthe - Boston House Pricing Challenge
15 pages
Thinkcspy 3
100% (1)
Thinkcspy 3
415 pages
0.1 Stock Data
100% (1)
0.1 Stock Data
4 pages
Unit - 4 Machine Learning
100% (1)
Unit - 4 Machine Learning
84 pages
ML0101EN Clas K Nearest Neighbors CustCat Py v1
100% (1)
ML0101EN Clas K Nearest Neighbors CustCat Py v1
11 pages
Merging - Scaled - 1D - & - Trying - Different - CLassification - ML - Models - .Ipynb - Colaboratory
100% (1)
Merging - Scaled - 1D - & - Trying - Different - CLassification - ML - Models - .Ipynb - Colaboratory
16 pages
EMF CheatSheet V4
100% (1)
EMF CheatSheet V4
2 pages
Hypothesis and Hypothesis Testing
100% (1)
Hypothesis and Hypothesis Testing
59 pages
Bootstrap Powerpoint
100% (1)
Bootstrap Powerpoint
10 pages
01-Introduction Machine Learning
100% (1)
01-Introduction Machine Learning
48 pages
Regressao Linear Simples - Ipynb - Colaboratory
100% (1)
Regressao Linear Simples - Ipynb - Colaboratory
2 pages
Csi 5155 ML Project Report
100% (1)
Csi 5155 ML Project Report
24 pages
Vinee
100% (1)
Vinee
28 pages
ML Lect1
100% (1)
ML Lect1
51 pages
HW1
100% (1)
HW1
8 pages
Classification Problems
100% (1)
Classification Problems
25 pages
Sales Forecasting
100% (1)
Sales Forecasting
10 pages
Decision Trees: at Some Point of Time You Have To Take A Decision Sitting On A Tree
100% (1)
Decision Trees: at Some Point of Time You Have To Take A Decision Sitting On A Tree
19 pages
Book
100% (1)
Book
480 pages
Data Analytics Time Table V2
100% (1)
Data Analytics Time Table V2
6 pages
XG Boost PDF
100% (1)
XG Boost PDF
3 pages
SVM (Support Vector Machine) For Classification - by Aditya Kumar - Towards Data Science
100% (1)
SVM (Support Vector Machine) For Classification - by Aditya Kumar - Towards Data Science
28 pages
Correlation & Regression Analysis
100% (1)
Correlation & Regression Analysis
39 pages
Bagging and Boosting
100% (1)
Bagging and Boosting
19 pages
8 Best Python Cheat Sheets For Beginners and Intermediate Learners
100% (1)
8 Best Python Cheat Sheets For Beginners and Intermediate Learners
13 pages
Project 1 - Radio Link Failure Prediction
100% (1)
Project 1 - Radio Link Failure Prediction
8 pages
Introduction to Boosting: Slides Adapted from Che Wanxiang (车万翔) at HIT, and Robin Dhamankar of Many thanks!
100% (1)
Introduction to Boosting: Slides Adapted from Che Wanxiang (车万翔) at HIT, and Robin Dhamankar of Many thanks!
41 pages
XGBoost R Tutorial
100% (1)
XGBoost R Tutorial
10 pages
Importing Libraries: Import As Import As Import As From Import As From Import From Import Import
100% (1)
Importing Libraries: Import As Import As Import As From Import As From Import From Import Import
11 pages
Homoscedasticity, Heteroscedasticity and Multicollinearity
100% (1)
Homoscedasticity, Heteroscedasticity and Multicollinearity
10 pages
Machine Learning Algorithm
100% (2)
Machine Learning Algorithm
20 pages
Outlines: Statements of Problems Objectives Bagging Random Forest Boosting Adaboost
100% (1)
Outlines: Statements of Problems Objectives Bagging Random Forest Boosting Adaboost
14 pages
Cloud Motion Tracking (1) (Read-Only)
100% (1)
Cloud Motion Tracking (1) (Read-Only)
10 pages
Cardio Screen RF
100% (1)
Cardio Screen RF
27 pages
Teleco Cutomer Churn
100% (1)
Teleco Cutomer Churn
5 pages
Assignment No - 6-1
100% (1)
Assignment No - 6-1
3 pages
Classification and Prediction
100% (1)
Classification and Prediction
31 pages
K-NN (Nearest Neighbor)
100% (1)
K-NN (Nearest Neighbor)
17 pages
Outliers, Hypothesis and Natural Language Processing
100% (1)
Outliers, Hypothesis and Natural Language Processing
7 pages
Lab7.ipynb - Colaboratory
100% (1)
Lab7.ipynb - Colaboratory
5 pages
Multicollinearity Exercise
100% (1)
Multicollinearity Exercise
6 pages
PR01
100% (1)
PR01
41 pages
Peter Dueben: Royal Society University Research Fellow & ECMWF's Coordinator For Machine Learning and AI Activities
100% (1)
Peter Dueben: Royal Society University Research Fellow & ECMWF's Coordinator For Machine Learning and AI Activities
33 pages
Logistics Regression
100% (1)
Logistics Regression
5 pages
PRML Slides 3
No ratings yet
PRML Slides 3
57 pages
ML Lecture Linear Regression 1
No ratings yet
ML Lecture Linear Regression 1
33 pages
18 - Modern Physics-01-Theory
No ratings yet
18 - Modern Physics-01-Theory
19 pages
The Fermentation Process of Beer in Africa
No ratings yet
The Fermentation Process of Beer in Africa
3 pages
ASTM A860-2022
No ratings yet
ASTM A860-2022
5 pages
Test - 1 Answer Paper
No ratings yet
Test - 1 Answer Paper
17 pages
2010 Fracture Control Strategy For Conversion of O&G Pipelines To CO2
No ratings yet
2010 Fracture Control Strategy For Conversion of O&G Pipelines To CO2
12 pages
ComfortDelgoro Sustainability Update
No ratings yet
ComfortDelgoro Sustainability Update
8 pages
Production of Synthetic Fluorspar From Waste
No ratings yet
Production of Synthetic Fluorspar From Waste
4 pages
Download Complete (Ebook) Routledge Handbook of Sustainable and Resilient Infrastructure by Paolo Gardoni ISBN 9781138306875, 1138306878 PDF for All Chapters
100% (10)
Download Complete (Ebook) Routledge Handbook of Sustainable and Resilient Infrastructure by Paolo Gardoni ISBN 9781138306875, 1138306878 PDF for All Chapters
67 pages
Apex Ultima Exterior Emulsion Paint
No ratings yet
Apex Ultima Exterior Emulsion Paint
2 pages
ADI-145 Microvit K3 Promix MNB 96 (en-US)
No ratings yet
ADI-145 Microvit K3 Promix MNB 96 (en-US)
10 pages
4 - Technological Change
100% (1)
4 - Technological Change
3 pages
Distribution in Statistics
No ratings yet
Distribution in Statistics
49 pages
Newborn Worksheet
No ratings yet
Newborn Worksheet
4 pages
Siegenia LS Gear - en
No ratings yet
Siegenia LS Gear - en
9 pages
Operating Systems
No ratings yet
Operating Systems
15 pages
Sistem Lampu Besar Dan Lampu Belakang
No ratings yet
Sistem Lampu Besar Dan Lampu Belakang
21 pages
NIAAA RethinkingDrinking
No ratings yet
NIAAA RethinkingDrinking
24 pages
Unit - 12 - PWM Techniques - 2023
No ratings yet
Unit - 12 - PWM Techniques - 2023
24 pages
Car Auto - Data
No ratings yet
Car Auto - Data
6 pages
Chapter 447-454
No ratings yet
Chapter 447-454
14 pages
Fo Project
No ratings yet
Fo Project
28 pages
HDFC - Saving Accounts
No ratings yet
HDFC - Saving Accounts
11 pages
GD&T Symbols, Definitions ASME Y14.5-2009 Training - ISO G&T Symbols 1101 Definitions - GD&T Trainers - Engineers Edge
No ratings yet
GD&T Symbols, Definitions ASME Y14.5-2009 Training - ISO G&T Symbols 1101 Definitions - GD&T Trainers - Engineers Edge
8 pages
CAS 1 / CA 1 / CA 2 Basic Unit: Installation and Operating Instructions
No ratings yet
CAS 1 / CA 1 / CA 2 Basic Unit: Installation and Operating Instructions
52 pages
6425d14a16c25 The Laburnum Top
No ratings yet
6425d14a16c25 The Laburnum Top
4 pages
AMTED398078EN - Web 55
No ratings yet
AMTED398078EN - Web 55
1 page
Marine Project Guide WV32 - 2/1997
100% (1)
Marine Project Guide WV32 - 2/1997
125 pages
Types of Cooking Methods
100% (1)
Types of Cooking Methods
22 pages
NOVOAIR - View Reservation Shied
No ratings yet
NOVOAIR - View Reservation Shied
2 pages
2324 Physics Level M CHAPTER 6 Term 2
No ratings yet
2324 Physics Level M CHAPTER 6 Term 2
10 pages