Machine Learning Course - Kernel Regression
Machine Learning Course - Kernel Regression
changes by Martin Jaggi 2019, changes by Rüdiger Urbanke 2018, changes by Martin Jaggi
2016, 2017
Mohammad
c Emtiyaz Khan 2015
Last updated on: October 31, 2019
Motivation
The ridge solution w? ∈ RD has a
counterpart α? ∈ RN . Using dual-
ity, we will establish a relationship
between w? and α? which leads the
way to kernels.
Ridge regression
Recall the ridge regression problem
min 1
2 ky − Xwk2 + λ2 kwk2
w
w? = arg min 1
2 ky − Xwk2 + λ2 kwk2
w
1 >
α = arg max − α (XX> + λIN )α + α>y
?
α 2
i.e. they both have the same
optimal value. Also, we can always
have the correspondence mapping
w = X>α.
Kernel functions
The linear kernel is defined below:
>
x1 x1 x> >
1 x2 . . . x1 xN
>
x> > >
2 x1 x2 x2 . . . x2 xN
K = XX = . .. ... .. .
.
x> > >
N x1 xN x2 . . . xN xN
Examples of kernels
The above kernel is an example of
the polynomial kernel. Another ex-
ample is the Radial Basis Function
(RBF) kernel.
1
κ(x, x ) = exp − (x − x0)>(x − x0)
0
2
See more examples in Section 14.2
of Murphy’s book.