0% found this document useful (0 votes)

11 views

KernelMethods

The document discusses kernel methods in machine learning, particularly focusing on the limitations of half-spaces for separating certain datasets and the need for feature mapping to achieve better classification. It introduces polynomial feature mapping and the kernel trick, which allows for efficient computation of linear separators in high-dimensional spaces without directly accessing the feature space. The document also covers specific kernel functions, such as polynomial and Gaussian kernels, and their properties in relation to inner products in Hilbert spaces.

Uploaded by

shahpanav8

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

KernelMethods

Uploaded by

shahpanav8

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

DS303: Introduction to Machine Learning

Kernel Methods

Manjesh K. Hanawal
Limitation of Half Spaces
Consider the domain points {−10, −9, . . . , −1, 0, 1, . . . , 9, 10},
where the labels are:
▶ +1 for all x such that |x| > 2.
▶ −1 otherwise.
This data cannot be seperated by usual halfspaces
Also consider the following 2D dataset
x2

1
x1
−2 −1 1 2
−1

−2

This figure too cannot be seperated using halfspaces

DS303 Manjesh K. Hanawal 2
Feature Mapping

Coming back to the first problem discussed,we need to find some

transformation.
First define a mapping ψ : R → R2 as follows:

ψ(x) = (x, x 2 ). (1)

We use the term feature space to denote the range of ψ. After

applying ψ the data can be easily explained using the half-space:

h(x) = sign(⟨w, ψ(x)⟩ − b), (2)

where w = (0, 1) and b = 5.

DS303 Manjesh K. Hanawal 3

The basic paradigm is as follows:
1. Given some domain set X and a learning task, choose a
mapping ψ : X → F, for some feature space F, that will
usually be Rn for some n.
2. Given a sequence of labeled examples,
S = (x1 , y1 ), . . . , (xm , ym ), create the image sequence:

Ŝ = (ψ(x1 ), y1 ), . . . , (ψ(xm ), ym ).

3. Train a linear predictor h over Ŝ.

4. Predict the label of a test point, x, to be h(ψ(x)).
The success of this paradigm depends on how good our function ψ
is

DS303 Manjesh K. Hanawal 4

Polynomial Feature Mapping and Half-Spaces

The prediction of a standard half-space classifier is based on a

linear mapping x 7→ ⟨w, x⟩. However, certain datasets require
nonlinear decision boundaries, which can be achieved using
polynomial feature mappings.
Example: Consider a polynomial mapping of degree k:
k
X
p(x) = wj x j , (3)
j=0

which can be rewritten as p(x) = ⟨w, ψ(x)⟩, where

ψ : R → Rk+1 , x 7→ (1, x, x 2 , . . . , x k ). (4)

This transformation allows a linear classifier to separate data that

was not linearly separable in the original space.

DS303 Manjesh K. Hanawal 5

Polynomial Feature Mapping

▶ It follows that learning a k degree polynomial over R can be

done by learning a linear mapping in the (k+ 1) dimensional
feature space.
▶ Polynomial-based classifiers yield much richer hypothesis
classes than halfspaces.
▶ While the classifier is always linear in the feature space, it can
have highly nonlinear behavior on the original space from
which instances were sampled.
▶ If the range of ψ is very large, we need many more samples in
order to learn a halfspace in the range of ψ
▶ Performing calculations in the high dimensional space might
be too costly

DS303 Manjesh K. Hanawal 6

Kernel Trick

However, computing linear separators in very high-dimensional

data may be computationally expensive.
The common solution is kernel-based learning. The term
”kernels” describes inner products in the feature space. Given an
embedding ψ of some domain space X into a Hilbert space, we
define the kernel function:

K (x, x′ ) = ⟨ψ(x), ψ(x′ )⟩. (5)

One can think of K as specifying similarity between instances and
of the embedding ψ as mapping the domain into a more expressive
space.

DS303 Manjesh K. Hanawal 7

Kernel Trick

The SVM optimization problem can be generalized as:

min (f (⟨w, ψ(x1 )⟩, . . . , ⟨w, ψ(xm )⟩) + R(∥w∥)) (6)

where f : Rm → R is an arbitrary function and R : R+ → R is a

monotonically non-decreasing function.
Special Cases:
▶ Soft-SVM (for homogeneous halfspaces): R(a) = λa2 , and
f (a1 , . . . , am ) = m1 i max{0, 1 − yi ai }.
P

▶ Hard-SVM (for nonhomogeneous halfspaces): R(a) = a2

with constraints yi (ai + b) ≥ 1 for all i.
Key Insight: There exists an optimal solution in the span of
{ψ(x1 ), . . . , ψ(xm )}.

DS303 Manjesh K. Hanawal 8

Representer Theorem

Theorem (Representer Theorem)

Assume that ψ is a mapping from X to a Hilbert space. Then,
there exists a vector α ∈ Rm such that
m
X
w= αi ψ(xi )
i=1

is an optimal solution of Equation 6

DS303 Manjesh K. Hanawal 9

SVM optimization problem
Note: All versions of the SVM optimization problem we have
derived so far are instances of the following general problem:

min (f (⟨w , ψ(x1 )⟩, . . . , ⟨w , ψ(xm )⟩) + R(∥w ∥)) (7)

where f : Rm → R is an arbitrary function and R : R+ → R is a

monotonically nondecreasing function.
Instead of solving Equation 7, we can solve the equivalent problem:

 
m
X m
X sX
min αj K (xj , x1 ), . . . , αj K (xj , xm )+R  αi αj K (xj , xi )
α∈Rm
j=1 j=1 i,j
(8)
To solve the optimization problem given in Equation 8, we do not
need any direct access to elements in the feature space. The only
thing we should know is how to calculate inner products in the
feature space, or equivalently, to calculate the kernel function.
DS303 Manjesh K. Hanawal 10
SVM optimization problem

In fact, to solve Equation 8 in previous slide we solely need to

know the value of the m × x matrix G such that Gi,j = K (xi , xj ),
which is often called the Gram matrix. We can write the problem
in the equivalent soft-margin problem as:
m
!
T 1 X
min λα G α + max (0, 1 − yi (G α)i ) (9)
α∈Rm m
i=1

where (G α)i is the ith element of the vector obtained by

multiplying the Gram matrix G by the vector α.

DS303 Manjesh K. Hanawal 11

Polynomial Kernel

The degree k polynomial kernel is defined as

K (x, x ′ ) = (1 + ⟨x, x ′ ⟩)k .

Now, we will show that this is indeed a kernel function. That is,
we will show that there exists a mapping ψ from the original space
to some higher-dimensional space for which

K (x, x ′ ) = ⟨ψ(x), ψ(x ′ )⟩.

For simplicity, denote x0 = x0′ = 1. Then, we have:

DS303 Manjesh K. Hanawal 12

Polynomial Kernel

K (x, x ′ ) = (1 + ⟨x, x ′ ⟩)k = (1 + ⟨x, x ′ ⟩) · · · (1 + ⟨x, x ′ ⟩)

   
Xn Xn
= xj xj′  · · ·  xj xj′ 
j=0 j=0

X k
Y
= xJi xJ′ i
J∈{0,1,...,n}k i=1

X k
Y k
Y
= xJi xJ′ i .
J∈{0,1,...,n}k i=1 i=1
k
Now, if we define ψ : Rn → R(n+1) such that for
J ∈ {0, 1, . . . , n}k there is an element of ψ(x) that equals
Q k
i=1 xJi , we obtain that
DS303 Manjesh K. Hanawal 13
Polynomial Kernel

K (x, x ′ ) = ⟨ψ(x), ψ(x ′ )⟩.

▶ Since ψ contains all the monomials up to degree k, a
halfspace over the range of ψ corresponds to a polynomial
predictor of degree k over the original space. Hence, learning
a halfspace with a degree k polynomial kernel enables us to
learn polynomial predictors of degree k over the original space.
▶ Note that here the complexity of implementing K is O(n),
while the dimension of the feature space is on the order of nk .

DS303 Manjesh K. Hanawal 14

Gaussian Kernel
Let the original instance space be R and consider the mapping ψ
where, for each nonnegative integer n ≥ 0, there exists an element
x2
q
1 −2 n
ψ(x)n that equals n! e x . Then,
∞ r ! r !
′
X 1 − x2 n 1 − (x ′ )2 ′ n
⟨ψ(x), ψ(x )⟩ = e 2x e 2 (x )
n! n!
n=0
∞
(x 2 +(x ′ )2 ) X (xx ′ )n
= e− 2
n!
n=0
∥x−x ′ ∥2
= e− 2

Here the feature space is of infinite dimension while evaluating the

kernel is very simple. More generally, given a scalar σ > 0, the
Gaussian kernel is defined to be
∥x−x′ ∥2
K (x, x′ ) = e − 2σ .
DS303 Manjesh K. Hanawal 15
Gaussian Kernel

▶ Intuitively, the Gaussian kernel sets the inner product in the

feature space between x, x′ to be close to zero if the instances
are far away from each other (in the original domain) and
close to 1 if they are close. σ is a parameter that controls the
scale determining what we mean by close.
▶ It is easy to verify that K implements an inner product in a
space in which for any n and any monomial of order k there
∥x∥2
q
1 − 2 Qn
exists an element of ψ(x) that equals n! e i=1 xJi .
Hence, we can learn any polynomial predictor over the original
space by using a Gaussian kernel.

DS303 Manjesh K. Hanawal 16

The RBF Kernel

The Gaussian kernel is also called the RBF kernel, for “Radial
Basis Functions.”

DS303 Manjesh K. Hanawal 17

Lemma

A symmetric function K : X × X → R implements an inner

product in some Hilbert space if and only if it is positive
semidefinite; namely, for all x1 , . . . , xm , the Gram matrix
Gi,j = K (xi , xj ) is a positive semidefinite matrix.
Proof:
It is trivial to see that if K implements an inner product in some
Hilbert space then the Gram matrix is positive semidefinite. For
the other direction, define the space of functions over X as
RX = {f : X → R}. For each x ∈ X , let ψ(x) be the function
x 7→ K (·, x). Define a vector space by taking all linear
combinations of elements of the form K (·, x). Define an inner
product on this vector space to be
X X X
⟨ αi K (·, xi ), βj K (·, xj′ )⟩ = αi βj K (xi , xj′ ).
i j i,j
DS303 Manjesh K. Hanawal 18
Lemma

This is a valid inner product since it is symmetric (because K is

symmetric), it is linear (immediate), and it is positive definite (it is
easy to see that K (x, x) ≥ 0 with equality only for ψ(x) being the
zero function). Clearly,

⟨ψ(x), ψ(x ′ )⟩ = ⟨K (·, x), K (·, x ′ )⟩ = K (x, x ′ ),

which concludes our proof.

DS303 Manjesh K. Hanawal 19

07 Kernels
No ratings yet
07 Kernels
6 pages
Lecture 05
No ratings yet
Lecture 05
49 pages
Kernel Models 1233
No ratings yet
Kernel Models 1233
56 pages
03 - Kernelization
No ratings yet
03 - Kernelization
32 pages
Support Vector Machines: Kernels: CS4780/5780 - Machine Learning Fall 2011 Thorsten Joachims Cornell University
No ratings yet
Support Vector Machines: Kernels: CS4780/5780 - Machine Learning Fall 2011 Thorsten Joachims Cornell University
15 pages
Introduction To Kernels: Max Welling
No ratings yet
Introduction To Kernels: Max Welling
16 pages
Lecture 8_Kernels
No ratings yet
Lecture 8_Kernels
32 pages
SVM Class 2
No ratings yet
SVM Class 2
87 pages
Ds 11
No ratings yet
Ds 11
21 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
45 pages
SVM
No ratings yet
SVM
57 pages
Introduction To Support Vector Machines: BTR Workshop Fall 2006
No ratings yet
Introduction To Support Vector Machines: BTR Workshop Fall 2006
88 pages
Introduction To Support Vector Machines: BTR Workshop Fall 2006
No ratings yet
Introduction To Support Vector Machines: BTR Workshop Fall 2006
88 pages
ML Kernel Methods
No ratings yet
ML Kernel Methods
51 pages
cs229 Notes3
No ratings yet
cs229 Notes3
30 pages
Time Series Forecasting by Using Wavelet Kernel SVM
No ratings yet
Time Series Forecasting by Using Wavelet Kernel SVM
52 pages
2021 UNAS REFER Rafi Yon Saputra 173112706420242 Kernel Primer
No ratings yet
2021 UNAS REFER Rafi Yon Saputra 173112706420242 Kernel Primer
65 pages
05 Kernel
No ratings yet
05 Kernel
24 pages
An Introduction To Kernel Methods: C. Campbell
No ratings yet
An Introduction To Kernel Methods: C. Campbell
38 pages
SVM
No ratings yet
SVM
40 pages
4c Kernels
No ratings yet
4c Kernels
31 pages
Chap6.1-KernelMethods
No ratings yet
Chap6.1-KernelMethods
36 pages
ML Lecture06 2
No ratings yet
ML Lecture06 2
63 pages
תרגול - SVM 1
No ratings yet
תרגול - SVM 1
32 pages
Lect 3
No ratings yet
Lect 3
14 pages
SD-M1 TSI Chapitre 4
No ratings yet
SD-M1 TSI Chapitre 4
42 pages
SVM 4
No ratings yet
SVM 4
8 pages
SVM Kernel Functions
No ratings yet
SVM Kernel Functions
12 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
Machine Learning With Kernel Methods
No ratings yet
Machine Learning With Kernel Methods
760 pages
MergedPDF Iml
No ratings yet
MergedPDF Iml
114 pages
Kernal Methods Machine Learning
No ratings yet
Kernal Methods Machine Learning
53 pages
0701907v3
No ratings yet
0701907v3
53 pages
22-Kernel Tricks Shit
No ratings yet
22-Kernel Tricks Shit
43 pages
Polynomial Kernel - Wikipedia
No ratings yet
Polynomial Kernel - Wikipedia
2 pages
hw2 4
No ratings yet
hw2 4
3 pages
SVM Extra Kernels
No ratings yet
SVM Extra Kernels
29 pages
Kernel Methods!: Sargur Srihari!
No ratings yet
Kernel Methods!: Sargur Srihari!
29 pages
Lecture17 Kernels
No ratings yet
Lecture17 Kernels
23 pages
Practice_Problems_for_ML_Midterms
No ratings yet
Practice_Problems_for_ML_Midterms
5 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Kernel Functions
No ratings yet
Kernel Functions
35 pages
Lecture 5
No ratings yet
Lecture 5
19 pages
Some Methods of Constructing Kernel
No ratings yet
Some Methods of Constructing Kernel
23 pages
SVM Intro
No ratings yet
SVM Intro
23 pages
LN - ieML SupportVectorMachines
No ratings yet
LN - ieML SupportVectorMachines
36 pages
Vahid
No ratings yet
Vahid
18 pages
Lec5 SVM Kernel SoftMargin
No ratings yet
Lec5 SVM Kernel SoftMargin
44 pages
2014 02 26 Kernels
No ratings yet
2014 02 26 Kernels
140 pages
Support Vector Machines: More Generally Kernel Methods
No ratings yet
Support Vector Machines: More Generally Kernel Methods
58 pages
Kernel Methods: Feature Mapping at No Cost
No ratings yet
Kernel Methods: Feature Mapping at No Cost
25 pages
Lecture4 introToRKHS
No ratings yet
Lecture4 introToRKHS
33 pages
4 - SVM
No ratings yet
4 - SVM
58 pages
Machine Learning - Open Elective - Part III
No ratings yet
Machine Learning - Open Elective - Part III
90 pages
SVM Tutorial
No ratings yet
SVM Tutorial
31 pages
Introduction To Support Vector Machines
No ratings yet
Introduction To Support Vector Machines
23 pages
Lect3 2
No ratings yet
Lect3 2
43 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Elementary Calculus
From Everand
Elementary Calculus
George N. Frempong
No ratings yet
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
CE 169 Linear Algebra 169 E
No ratings yet
CE 169 Linear Algebra 169 E
4 pages
1 Functions and Limits
100% (1)
1 Functions and Limits
4 pages
Multiple Integrals 1 Multiple Integrals Over Compact Intervals
No ratings yet
Multiple Integrals 1 Multiple Integrals Over Compact Intervals
8 pages
Rigid Body Dynamics: Synonyms
No ratings yet
Rigid Body Dynamics: Synonyms
17 pages
Calculus Cheat Sheet Integrals Reduced
No ratings yet
Calculus Cheat Sheet Integrals Reduced
3 pages
Probability Mass Function
No ratings yet
Probability Mass Function
2 pages
Chap.5 Eigenvalues Eigenvectors
No ratings yet
Chap.5 Eigenvalues Eigenvectors
34 pages
Section - 49 Markov Chain Steady State PDF
No ratings yet
Section - 49 Markov Chain Steady State PDF
19 pages
Study On Linear Connections of Manifolds
No ratings yet
Study On Linear Connections of Manifolds
6 pages
Lesson 01 Anti Differentiation
No ratings yet
Lesson 01 Anti Differentiation
6 pages
@maths For Natural Injibara
No ratings yet
@maths For Natural Injibara
80 pages
Cbse - Class X - Maths Worksheet - Trigonometry
80% (5)
Cbse - Class X - Maths Worksheet - Trigonometry
2 pages
Em14 II
No ratings yet
Em14 II
16 pages
Field Computation by Moment Methods
No ratings yet
Field Computation by Moment Methods
237 pages
Linear Algebra _ DPP 01 (Part-01) Discussion Notes
No ratings yet
Linear Algebra _ DPP 01 (Part-01) Discussion Notes
28 pages
Signals and Sytems - TA2 - Laplace Transform
No ratings yet
Signals and Sytems - TA2 - Laplace Transform
2 pages
Laplace Matlab Full
No ratings yet
Laplace Matlab Full
12 pages
Chapter 7 Part 3
No ratings yet
Chapter 7 Part 3
7 pages
Mathematics Honours
No ratings yet
Mathematics Honours
25 pages
Y 10 U 12 Topicsforfunctionsquadratics
No ratings yet
Y 10 U 12 Topicsforfunctionsquadratics
2 pages
HOME ASSIGNMENT 02
No ratings yet
HOME ASSIGNMENT 02
10 pages
Proof: Euler S Theorem
No ratings yet
Proof: Euler S Theorem
20 pages
Domain and Range of Inverse Functions
No ratings yet
Domain and Range of Inverse Functions
5 pages
Applications of Differential and Difference Equations Experiment-7 NAME: P. Krishna Sai Reddy REG NO: 17BCE2063 Aim: To Solve and Visualize The Solution of An ODE by Laplace
No ratings yet
Applications of Differential and Difference Equations Experiment-7 NAME: P. Krishna Sai Reddy REG NO: 17BCE2063 Aim: To Solve and Visualize The Solution of An ODE by Laplace
8 pages
FXN BNSL
No ratings yet
FXN BNSL
46 pages
P A R T D Complex Analysis Chap 13 Compl PDF
No ratings yet
P A R T D Complex Analysis Chap 13 Compl PDF
222 pages
2014 Elena
No ratings yet
2014 Elena
20 pages
09 Gradient, Divergence and Curl
No ratings yet
09 Gradient, Divergence and Curl
12 pages
© 2017 by Mcgraw-Hill Education. Permission Required For Reproduction or Display
No ratings yet
© 2017 by Mcgraw-Hill Education. Permission Required For Reproduction or Display
27 pages
8.2 Orthogonal Polynomials and Least Squares Approximation: W R W R W R W R W R
No ratings yet
8.2 Orthogonal Polynomials and Least Squares Approximation: W R W R W R W R W R
19 pages