Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
47 views

Linear Regression: Machine Learning Course - CS-433

Linear regression assumes a linear relationship between inputs and outputs. Simple linear regression uses one input dimension, while multiple linear regression uses multiple input dimensions. The goal is to learn/estimate the parameters of the model (w) by fitting it to the training data using an optimization algorithm. However, having more parameters than data points (the D > N problem) makes the task under-determined. Regularization helps address this issue.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views

Linear Regression: Machine Learning Course - CS-433

Linear regression assumes a linear relationship between inputs and outputs. Simple linear regression uses one input dimension, while multiple linear regression uses multiple input dimensions. The goal is to learn/estimate the parameters of the model (w) by fitting it to the training data using an optimization algorithm. However, having more parameters than data points (the D > N problem) makes the task under-determined. Regularization helps address this issue.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

annotated

Version

Machine Learning Course - CS-433

Linear Regression

Sept 15, 2020

minor changes by Martin Jaggi 2020,2019,2018,2017,2016; c Mohammad Emtiyaz Khan 2015


Last updated on: September 14, 2020
1 Model: Linear Regression
What is it?
Linear regression is a model that as-
sumes a linear relationship between
inputs and the output.

÷ :*

Why learn about linear


regression?
Plenty of reasons: simple, easy to
understand, most widely used, eas-
ily generalized to non-linear mod-
els. Most importantly, you can learn
almost all fundamental concepts of
ML with regression alone.
^

IR
Simple linear regression

to
With only one input dimension, we
get simple linear regression.
" ""

yn ⇡ f (xn) := w0 + w1xn1

Here, w = (w0, w1) are the two pa-


rameters of the model. They de-
.
scribe f . .

Multiple linear regression


If our data has multiple input di-
mensions, we obtain multivariate
linear regression.
" ""
yn ⇡ f (xn)
:= w0 + w1xn1 + . . . + wD xnD
⇣ w1 ⌘
= w + x> ..
0 n .

÷
wD .
=: x̃>
n w̃

Note that we add a tilde over the in-


put vector, and also the weights, to &
indicate they now contain the addi- He n

tional o↵set term (a.k.a. bias term).


1
y

E
"

now
=
: e "
Learning / Estimation / Fitting
Given data, we would like to find
w̃ = [w0, w1, . . . , wD ]. This is
called learning or estimating the pa-
rameters or fitting the model.
To do so, we need an optimization
algorithm, which we will discuss in
the chapter after the next.

Additional Notes

r
Alternative when not using an ’o↵set’ term
Above we have used D + 1 model parameters, to fit data of dimen-
sion D. An alternative also often used in practice, in particular for
high-dimensional data, is to ignore the o↵set term w0.

yn ⇡ f (xn) := w1xn1 + . . . + wD xnD


= x>
nw

in this case, we have just D parameters to learn, instead of D + 1.

As a warning, you should be aware that for some machine learning mod-
els, the number of weight parameters (elements of w) can in general be
very di↵erent from D (being the dimension of the input data). For an
ML model where they do not coincide, think for example of a neural
network (more details later).
\
Matrix multiplication
To go any further, one must revise matrix multiplication. Remember
that multiplication of M ⇥ N matrix with a N ⇥ D matrix results in a
M ⇥ D matrix. Also, two matrices of size M ⇥ N1 and N2 ⇥ M can
only be multiplied when N1 = N2.

gThe D > N Problem



← data points
dassic.in#
modernDL:N
features
Consider the following simple situation: You haveC N = 1 and you want
to fit y1 ⇡ w0 + w1x11, i.e. you want to find w = (w0, w1) given one
-
'
pair (y1, x11). Is it possible to find such a line? " '

This problem is related to something called D > N problem (in statistics


typically named p > n). It means that the number of parameters ex-
ceeds number of data examples. In other words, we have more variables
than we have data information. For many models, such as linear regres-
sion, this makes the task under-determined. We say that the model is
over-parameterized for the task.

deep learning
Using regularization is a way to avoid the issue described, which we will
learn later.

You might also like