0% found this document useful (0 votes)

42 views

Machine Learning Course - Kernel Regression

This document provides an overview of kernel ridge regression and the kernel trick. [1] It establishes the relationship between the primal solution w? and the dual solution α? for ridge regression. [2] It introduces the representer theorem, which allows solving problems involving w? equivalently in terms of α?. [3] This paves the way for kernelizing problems by defining kernels based on inner products in feature spaces, without needing to explicitly specify the feature maps.

Uploaded by

nagybaly

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views

Machine Learning Course - Kernel Regression

Uploaded by

nagybaly

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Machine Learning Course - CS-433

Kernel Ridge Regression and

the Kernel Trick

Oct 31, 2019

changes by Martin Jaggi 2019, changes by Rüdiger Urbanke 2018, changes by Martin Jaggi
2016, 2017 Mohammad
c Emtiyaz Khan 2015
Last updated on: October 31, 2019
Motivation
The ridge solution w? ∈ RD has a
counterpart α? ∈ RN . Using dual-
ity, we will establish a relationship
between w? and α? which leads the
way to kernels.

Ridge regression
Recall the ridge regression problem

min 1
2 ky − Xwk2 + λ2 kwk2
w

For its solution, we have that

w? = (X>X + λID )−1X>y

= X>(XX> + λIN )−1y =: X>α?,

where α? := (XX> + λIN )−1y.

This can be proved using the follow-

ing identity: let P be an N × D ma-
trix while Q be D × N , and let both
PQ + I and QP + I be invertible.

(PQ + IN )−1P = P(QP + ID )−1

What are the computational com-

plexities for the above two ways of
computing w??
With this, we know that w? =
X>α? lies in the column space
of X>,
 
x11 x12 . . . x1D
 x21 x22 . . . x2D 
where X =  .. .. . . . ..  
xN 1 xD2 . . . xN D

The representer theorem

The representer theorem generalizes
this result: for a w? minimizing the
following function for any Ln,
N
X
min Ln(x> λ
n w, yn ) + 2 kwk
2
w
n=1

there exists α? such that

w ? = X> α? .

Such a general statement was orig-

inally proved by Schölkopf, Her-
brich and Smola (2001).
Kernelized ridge regression
The representer theorem allows us
to write an equivalent optimization
problem in terms of α. For exam-
ple, for ridge regression, the follow-
ing two problems are equivalent:

w? = arg min 1
2 ky − Xwk2 + λ2 kwk2
w
1 >
α = arg max − α (XX> + λIN )α + α>y
?
α 2
i.e. they both have the same
optimal value. Also, we can always
have the correspondence mapping
w = X>α.

Most importantly, the second

problem is expressed in terms of
the matrix XX>. This is our first
example of a kernel matrix.

Note: To see the equivalence, you can show

that we obtain equal optimal values for the
two problems. Take the gradient of the sec-
ond expression, to get (XX> + λIN )α − y.
Setting this to 0 and solving for α results
in α? = (XX> + λIN )−1y.

If we combine this with the representer the-

orem w? = X>α? we find back the dual
solution.
Advantages of kernelized ridge regression
First, it might be computationally
efficient in some cases when solving
the system of equations.

Second, by defining K = XX>, we

can work directly with K and never
have to worry about X. This is the
kernel trick.

Third, working with α is some-

times advantageous, and provides
additional information for each dat-
apoint (e.g. as in SVMs).

Kernel functions
The linear kernel is defined below:
 >
x1 x1 x> >

1 x2 . . . x1 xN
>
 x> > >
2 x1 x2 x2 . . . x2 xN

K = XX =  .  .. ... .. .
. 
x> > >
N x1 xN x2 . . . xN xN

Kernel with basis functions φ(x)

with K := ΦΦ> is shown below:
> > >
 
φ(x1) φ(x1) φ(x1) φ(x2) . . . φ(x1) φ(xN )
 φ(x2)>φ(x1) φ(x2)>φ(x2) . . . φ(x2)>φ(xN ) 
. . . .
. ..

 . . . 
φ(xN )>φ(x1) φ(xN )>φ(x2) . . . φ(xN )>φ(xN )
The kernel trick
A big advantage of using kernels
is that we do not need to specify
φ(x) explicitly, since we can work
directly with K.

We will use a kernel function

κ(x, x0) and compute the (i, j)-th
entry of K as Kij = κ(xi, xj ). A
kernel function κ is usually associ-
ated with a feature map φ, such that

κ(x, x0) := φ(x)>φ(x0) .

For example, for the trivial or ‘lin-

ear kernel’ κ(x, x0) := x>x0, the fea-
ture map is just the original features,
φ(x0) = x0.

Another example: For data x ∈

R3, the kernel κ(x, x0) := (x>x0)2 =
(x1x01 +x2x02 +x3x03)2 corresponds to
>
2 2 2 √ √ √
φ(x) = x1, x2, x3, 2x1x2, 2x1x3, 2x2x3

The good news is that the evalua-

tion of a kernel is often faster when
using κ instead of φ.
Visualization
Why would we want such general
feature maps?

See video explaining linear sepa-

ration in the kernel space (where
φ(x) maps to) corresponding to
non-linear separation in the original
x-space: https://www.youtube.
com/watch?v=3liCbRZPrZA

Examples of kernels
The above kernel is an example of
the polynomial kernel. Another ex-
ample is the Radial Basis Function
(RBF) kernel.

1
κ(x, x ) = exp − (x − x0)>(x − x0)
0
2
See more examples in Section 14.2
of Murphy’s book.

A natural question is the following:

how can we ensure that there exists
a φ corresponding to a given ker-
nel K? The answer is: as long as the
kernel satisfies certain properties.
Properties of a kernel
A kernel function must be an inner-
product in some feature space. Here
are a few properties that ensure it is
the case.
1. K should be symmetric, i.e.
κ(x, x0) = κ(x0, x).
2. For any arbitrary input set
{xn} and all N , K should be
positive semi-definite.
An important subclass is the
positive-definite kernel functions,
giving rise to infinite-dimensional
feature spaces.
Exercises
1. Understand the relationship w? = X>α?. Understand the state-
ment of the representer theorem.
2. Show that the primal and dual formulations of ridge regression
are equivalent. Hint: show that the optimization problems corre-
sponding to w and α have the same optimal value.
3. Get familiar with various examples of kernels. See Section 6.2 of
Bishop on examples of kernel construction. Read Section 14.2 of
Murphy’s book for examples of kernels.
4. Revise and understand the difference between positive-definite
and positive-semi-definite matrices.
5. If you are interested in more details about kernels, read about Mer-
cer and Matern kernels from Kevin Murphy’s Section 14.2. There
is also a small note by Matthias Seeger on the git repository under
lectures/07, in particular for the case of infinite dimensional φ.

Risk Assessment of Shoe Manufacturing Process (Service Ind. LTD.) 123
No ratings yet
Risk Assessment of Shoe Manufacturing Process (Service Ind. LTD.) 123
6 pages
Kernel Ridge Regression
No ratings yet
Kernel Ridge Regression
8 pages
leastsquares_minnorm_problems
No ratings yet
leastsquares_minnorm_problems
6 pages
Practice_Problems_for_ML_Midterms
No ratings yet
Practice_Problems_for_ML_Midterms
5 pages
OQM Lecture Note - Part 1 Introduction To Mathematical Optimisation
No ratings yet
OQM Lecture Note - Part 1 Introduction To Mathematical Optimisation
10 pages
Lecture 10
No ratings yet
Lecture 10
15 pages
Multivariat Kernel Regression
No ratings yet
Multivariat Kernel Regression
3 pages
TA_session_06
No ratings yet
TA_session_06
13 pages
lec19_Eigenproblem
No ratings yet
lec19_Eigenproblem
8 pages
MTS423 - Functional Analysis
100% (1)
MTS423 - Functional Analysis
20 pages
Lecture 8_Kernels
No ratings yet
Lecture 8_Kernels
32 pages
Matlab Homework Experts 2
No ratings yet
Matlab Homework Experts 2
10 pages
Sparse Regression and Dictionary Learning
No ratings yet
Sparse Regression and Dictionary Learning
14 pages
Practice Problem Set 7: OA4201 Nonlinear Programming
No ratings yet
Practice Problem Set 7: OA4201 Nonlinear Programming
4 pages
Exercices 1
No ratings yet
Exercices 1
3 pages
Non Linear Programming Problems
No ratings yet
Non Linear Programming Problems
66 pages
SVM Class 2
No ratings yet
SVM Class 2
87 pages
KPCA
No ratings yet
KPCA
26 pages
OQM Lecture Note - Part 8 Unconstrained Nonlinear Optimisation
No ratings yet
OQM Lecture Note - Part 8 Unconstrained Nonlinear Optimisation
23 pages
Kernel Functions
No ratings yet
Kernel Functions
35 pages
tut 9s(updated)
No ratings yet
tut 9s(updated)
6 pages
Continuous Functions
No ratings yet
Continuous Functions
19 pages
KernelMethods
No ratings yet
KernelMethods
19 pages
B For I 1,, M: N J J J
No ratings yet
B For I 1,, M: N J J J
19 pages
Lecture5 Partitionspectral
No ratings yet
Lecture5 Partitionspectral
14 pages
Mathematical Economics: 1 What To Study
No ratings yet
Mathematical Economics: 1 What To Study
23 pages
Information Theory and Machine Learning
No ratings yet
Information Theory and Machine Learning
21 pages
EE5180_linear_regression_tutorial
No ratings yet
EE5180_linear_regression_tutorial
2 pages
Chapter 3: The Lagrange Method: Elements of Decision: Lecture Notes of Intermediate Microeconomics
No ratings yet
Chapter 3: The Lagrange Method: Elements of Decision: Lecture Notes of Intermediate Microeconomics
12 pages
ENEE 627 SPRING 2011 Information Theory Convexity
No ratings yet
ENEE 627 SPRING 2011 Information Theory Convexity
5 pages
09 Polynomials
No ratings yet
09 Polynomials
15 pages
EEE 5103 Power System Analysis 1
No ratings yet
EEE 5103 Power System Analysis 1
113 pages
Tut4 Questions
No ratings yet
Tut4 Questions
2 pages
ConvexSpring25_Week4
No ratings yet
ConvexSpring25_Week4
23 pages
M2 Exam 2022-23 Solutions
No ratings yet
M2 Exam 2022-23 Solutions
12 pages
Adaptive Mean Shift-Based Clustering
No ratings yet
Adaptive Mean Shift-Based Clustering
11 pages
Slides Chap5 KernelMethods
No ratings yet
Slides Chap5 KernelMethods
24 pages
Nonlinear Optimization
No ratings yet
Nonlinear Optimization
6 pages
Homework 2
No ratings yet
Homework 2
4 pages
Notes ch0
No ratings yet
Notes ch0
12 pages
EEE 5103 Power System Analysis 1 as of 14.06.2023 - Copy
No ratings yet
EEE 5103 Power System Analysis 1 as of 14.06.2023 - Copy
69 pages
Notes 3
No ratings yet
Notes 3
8 pages
CS771: Practice Set 2: Problem 1
No ratings yet
CS771: Practice Set 2: Problem 1
2 pages
Yogesh Meena (BCA-M15 4th SEM) CONM CCE
No ratings yet
Yogesh Meena (BCA-M15 4th SEM) CONM CCE
10 pages
hw01 Cvxopt sp19
No ratings yet
hw01 Cvxopt sp19
3 pages
Local Fields
No ratings yet
Local Fields
60 pages
Nonlinear Programming Methods.S1 Separable Programming: Problem Statement
0% (1)
Nonlinear Programming Methods.S1 Separable Programming: Problem Statement
10 pages
Convex - Module A Part 2
No ratings yet
Convex - Module A Part 2
27 pages
2 Nonlinear Programming Models
No ratings yet
2 Nonlinear Programming Models
27 pages
QuestionBank
No ratings yet
QuestionBank
4 pages
Optimization by UC Berkley
No ratings yet
Optimization by UC Berkley
77 pages
Generating Function Notes
No ratings yet
Generating Function Notes
17 pages
C 06 Anti Differentiation
No ratings yet
C 06 Anti Differentiation
31 pages
2021PSet02Solution
No ratings yet
2021PSet02Solution
21 pages
Solutions To The Exercises On The Kernel Trick
No ratings yet
Solutions To The Exercises On The Kernel Trick
3 pages
LNO Problem Sheet
No ratings yet
LNO Problem Sheet
2 pages
Assignment 8
No ratings yet
Assignment 8
3 pages
Generating Functions
No ratings yet
Generating Functions
10 pages
Kamza's Maths Revision Book
No ratings yet
Kamza's Maths Revision Book
17 pages
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Machine Learning - Classification
No ratings yet
Machine Learning - Classification
13 pages
Machine Learning - Logistic Regression
No ratings yet
Machine Learning - Logistic Regression
16 pages
Machine Learning - SVM
No ratings yet
Machine Learning - SVM
11 pages
Machine Learning Course - Matrix Factorization
No ratings yet
Machine Learning Course - Matrix Factorization
7 pages
Mortar Pig Cheat Sheet
50% (2)
Mortar Pig Cheat Sheet
13 pages
Intro To Pig
0% (1)
Intro To Pig
33 pages
jrc122016 c4lll Annex I
No ratings yet
jrc122016 c4lll Annex I
93 pages
Script Mastermind!
No ratings yet
Script Mastermind!
6 pages
Cross-Cultural Aspects of Tourism and Hospitality
No ratings yet
Cross-Cultural Aspects of Tourism and Hospitality
96 pages
Final Oral Exam: The Mystery of The Bermuda Triangle
No ratings yet
Final Oral Exam: The Mystery of The Bermuda Triangle
3 pages
Lab Manual - Drilling Engineering
No ratings yet
Lab Manual - Drilling Engineering
23 pages
Logic
No ratings yet
Logic
20 pages
Drying of Wet Solid
No ratings yet
Drying of Wet Solid
35 pages
CU-2022 B.sc. (General) Mathematics Semester-4 Paper-CC4-GE4 QP
No ratings yet
CU-2022 B.sc. (General) Mathematics Semester-4 Paper-CC4-GE4 QP
4 pages
5 0 Applications First - Order - Odes
No ratings yet
5 0 Applications First - Order - Odes
16 pages
Ee-Comm Skills Level 5
No ratings yet
Ee-Comm Skills Level 5
3 pages
Retraction Note: Analysis of Modern Circulation Industry Development Level Using Industrial Structure Mechanism
No ratings yet
Retraction Note: Analysis of Modern Circulation Industry Development Level Using Industrial Structure Mechanism
2 pages
Placement 2017-18 PDF
No ratings yet
Placement 2017-18 PDF
40 pages
MINI TEST 3
No ratings yet
MINI TEST 3
2 pages
Synopsis
No ratings yet
Synopsis
18 pages
Romanowsky Stain - Wikipedia
No ratings yet
Romanowsky Stain - Wikipedia
1 page
Gantry Crane System Modelling
No ratings yet
Gantry Crane System Modelling
8 pages
Phase I Time Table
No ratings yet
Phase I Time Table
1 page
Kernel Methods
No ratings yet
Kernel Methods
6 pages
Chapter Review
No ratings yet
Chapter Review
5 pages
JIS G 3141 Oos: Strips
No ratings yet
JIS G 3141 Oos: Strips
28 pages
Bifilar Oscillations Practical 2
No ratings yet
Bifilar Oscillations Practical 2
4 pages
Course Outline Social Work
100% (2)
Course Outline Social Work
7 pages
Nursing Prospectus
No ratings yet
Nursing Prospectus
2 pages
Artificial Intelligence For Non Destructive Testing of CFRP Prepreg Materials
No ratings yet
Artificial Intelligence For Non Destructive Testing of CFRP Prepreg Materials
10 pages
LISTENING 271024
No ratings yet
LISTENING 271024
2 pages
Bioresource Technology: Qing Qing, Bin Yang, Charles E. Wyman
No ratings yet
Bioresource Technology: Qing Qing, Bin Yang, Charles E. Wyman
7 pages
Role For Human Reliability Analysis (HRA)
No ratings yet
Role For Human Reliability Analysis (HRA)
28 pages
Linkages and Networking With Organizations ERIEN
No ratings yet
Linkages and Networking With Organizations ERIEN
5 pages
Sip Format Updated
No ratings yet
Sip Format Updated
13 pages