Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
29 views

Kernel Functions

Uploaded by

nick peak
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

Kernel Functions

Uploaded by

nick peak
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Support Vector Machines

Machine Learning

Manoj Kumar

Youtube

October 22, 2024

Manoj Sir (Youtube) Lecture 1 1 / 22


Outline

1 Introduction

2 Kernels

Manoj Sir (Youtube) Lecture 1 2 / 22


Vector Dot Product

Manoj Sir (Youtube) Lecture 1 3 / 22


Calculations Involved in Dot Product

Manoj Sir (Youtube) Lecture 1 4 / 22


Magic!!

Manoj Sir (Youtube) Lecture 1 5 / 22


Kernel Functions

Manoj Sir (Youtube) Lecture 1 6 / 22


Example of Kernel Functions

Manoj Sir (Youtube) Lecture 1 7 / 22


Validity of Kernel Functions

Manoj Sir (Youtube) Lecture 1 8 / 22


Validity of Kernel Functions

Method 1: We find a function K (x, x ′ ) such that

K (x, x ′ ) = ϕ(x).ϕ(x ′ )
Method 2: Mercer’s theorem
1 K (x, x ′ ) = K (x ′ , x) {Symmetric}
2 Datapoints x1 , x2 , x3 . The Kernel matrix K is formed by applying the kernel function to
each pair of data points:
 
K (x1 , x1 ) K (x1 , x2 ) K (x1 , x3 )
K = K (x2 , x1 ) K (x2 , x2 ) K (x2 , x3 )
K (x3 , x1 ) K (x3 , x2 ) K (x3 , x3 )

Kernel Matrix K should be a symmetric matrix and positive semi-definite (all eigenvalues
of K must be non-negative).

Manoj Sir (Youtube) Lecture 1 8 / 22


Types of Kernel Function

Manoj Sir (Youtube) Lecture 1 9 / 22


Types of Kernel Function

1 Polynomial Kernel Function


 d
K (x, x ′ ) = x ⊤ x ′ + c

x and x ′ are input vectors.


c is a constant that can be adjusted.
d is the degree of the polynomial.

Manoj Sir (Youtube) Lecture 1 9 / 22


Types of Kernel Function

1 Polynomial Kernel Function


 d
K (x, x ′ ) = x ⊤ x ′ + c

x and x ′ are input vectors.


c is a constant that can be adjusted.
d is the degree of the polynomial.
2 Gaussian Kernel / Radial Basis Function (RBF) Kernel
It maps the input space into an infinite dimensional feature space.
∥x−x ′ ∥2 d2
K (x, x ′ ) = e − 2σ 2 = e − 2σ2
d → distance between x and x ′
x and x ′ are input vectors.

Manoj Sir (Youtube) Lecture 1 9 / 22


Validity of Kernel Functions

Given that K1 : X × X → R and K2 : X × X → R are two symmetric, positive definite kernel


functions, the validity of the following kernel functions is as follows:
Valid Kernel Functions:
K (x, x ′ ) = c · K1 (x, x ′ ), where c is a positive constant.
K (x, x ′ ) = K1 (x, x ′ ) + K2 (x, x ′ )
Not a Valid Kernel Function:
K (x, x ′ ) = K1 (x, x ′ ) + 1
K2 (x,x ′ )

Manoj Sir (Youtube) Lecture 1 10 / 22


Question

Consider two data points, x1 = (1, −1) and x2 = (2, 2), in a binary classification task
using an SVM with a custom kernel function K (x, y ). The kernel function is applied to
these points, resulting in the following matrix, referred to as matrix A:
   
K (x1 , x1 ) K (x1 , x2 ) 1 3
=
K (x2 , x1 ) K (x2 , x2 ) 3 6

Which of the following statements is correct regarding matrix A and the kernel function
K (x, y )?
A) K (x, y ) is a valid kernel.
B) K (x, y ) is not a valid kernel.
C) Matrix A is positive semi-definite.
D) Matrix A is not positive semi-definite.
Question

Given a kernel function K1 : Rn × Rn → R and its corresponding feature map ϕ1 : Rn →


Rd , which feature map ϕ : Rn → Rd would correctly produce the kernel cK1 (x, z),
where c is a positive constant?
a) ϕ(x) = cϕ1 (x)

b) ϕ(x) = cϕ1 (x)
c) ϕ(x) = c 2 ϕ1 (x)
d) No such feature map exists.
Question

Let K1 : X × X → R and K2 : X × X → R be two symmetric, positive definite kernel


functions. Which of the following cannot be a valid kernel function?
(a) K (x, x ′ ) = 5 · K1 (x, x ′ )
(b) K (x, x ′ ) = K1 (x, x ′ ) + K2 (x, x ′ )
(c) K (x, x ′ ) = K1 (x, x ′ ) + 1
K2 (x,x ′ )
(d) All three are valid kernels.
Question

Given a kernel function K (x, x ′ ) = f (x)g (x ′ ) + f (x ′ )g (x), where f and g are real-valued
functions (Rd → R), the kernel is not valid. What additional terms would you include
in K to make it a valid kernel?
Options:
(A) f (x) + g (x)
(B) f (x)g (x) + f (x ′ )g (x ′ )
(C) f (x)f (x ′ ) + g (x)g (x ′ )
(D) f (x ′ ) + g (x ′ )
Question

Which of the following are properties that a kernel matrix always has?
□ Invertible
□ At least one negative eigenvalue
□ All the entries are positive
□ Symmetric
Question

Suppose ϕ(x) is an arbitrary feature mapping from input x ∈ X to ϕ(x) ∈ RN and let
K (x, z) = ϕ(x)⊤ ϕ(z). Then K (x, z) will always be a valid kernel function.
Circle one: True False
Question

Suppose ϕ(x) is the feature map induced by a polynomial kernel K (x, z) of degree d,
then ϕ(x) should be a d-dimensional vector.
Circle one: True False
Question

Which of the following are valid kernel functions?


 2

⃝ k(x, z) = exp − ∥x−z∥2σ 2
⃝ k(x, z) = ∥x∥∥z∥
 
⊤ 727 1
⃝ k(x, z) = x z
1 42
 
⊤ −727 1
⃝ k(x, z) = x z
1 −42
Question

= x⊤ rev(y
Let x, y ∈ Rd be two points. Consider the function k(x, y )   )where rev(y )
1 3
reverses the order of the components in y . For example, rev 2 = 2. Show that
  
3 1
k cannot be a valid kernel function.
Question

Which of the following statements about kernels are true?


⃝ A: The dimension of the lifted feature vectors Φ(·), whose inner products the
kernel function computes, can be infinite.
⃝ B: For any desired lifting Φ(x), we can design a kernel function k(x, z) that
will evaluate Φ(x)⊤ Φ(z) more quickly than explicitly computing Φ(x) and Φ(z).
⃝ C: The kernel trick, when it is applicable, speeds up a learning algorithm if the
number of sample points is substantially less than the dimension of the (lifted)
feature space.
⃝ D: If the raw feature vectors x, y are of dimension 2, then
k(x, y ) = x12 y12 + x22 y22 is a valid kernel.
Question

Suppose we have a feature map Φ and a kernel function k(Xi , Xj ) = Φ(Xi ) · Φ(Xj ).
Select the true statements about kernels.
A: If there are n sample points of dimension d, it takes O(nd) time to compute
the kernel matrix.
⃝ B: The kernel trick implies we do not compute Φ(Xi ) explicitly for any sample
point Xi .
⃝ C: For every possible feature map Φ : Rd → RD you could imagine, there is a
way to compute k(Xi , Xj ) in O(d) time.
⃝ D: Running times of kernel algorithms do not depend on the dimension D of
the feature space Φ(·).
Thank you!
Questions?

Manoj Sir (Youtube) Lecture 1 22 / 22

You might also like