Kernel Functions
Kernel Functions
Machine Learning
Manoj Kumar
Youtube
1 Introduction
2 Kernels
K (x, x ′ ) = ϕ(x).ϕ(x ′ )
Method 2: Mercer’s theorem
1 K (x, x ′ ) = K (x ′ , x) {Symmetric}
2 Datapoints x1 , x2 , x3 . The Kernel matrix K is formed by applying the kernel function to
each pair of data points:
K (x1 , x1 ) K (x1 , x2 ) K (x1 , x3 )
K = K (x2 , x1 ) K (x2 , x2 ) K (x2 , x3 )
K (x3 , x1 ) K (x3 , x2 ) K (x3 , x3 )
Kernel Matrix K should be a symmetric matrix and positive semi-definite (all eigenvalues
of K must be non-negative).
Consider two data points, x1 = (1, −1) and x2 = (2, 2), in a binary classification task
using an SVM with a custom kernel function K (x, y ). The kernel function is applied to
these points, resulting in the following matrix, referred to as matrix A:
K (x1 , x1 ) K (x1 , x2 ) 1 3
=
K (x2 , x1 ) K (x2 , x2 ) 3 6
Which of the following statements is correct regarding matrix A and the kernel function
K (x, y )?
A) K (x, y ) is a valid kernel.
B) K (x, y ) is not a valid kernel.
C) Matrix A is positive semi-definite.
D) Matrix A is not positive semi-definite.
Question
Given a kernel function K (x, x ′ ) = f (x)g (x ′ ) + f (x ′ )g (x), where f and g are real-valued
functions (Rd → R), the kernel is not valid. What additional terms would you include
in K to make it a valid kernel?
Options:
(A) f (x) + g (x)
(B) f (x)g (x) + f (x ′ )g (x ′ )
(C) f (x)f (x ′ ) + g (x)g (x ′ )
(D) f (x ′ ) + g (x ′ )
Question
Which of the following are properties that a kernel matrix always has?
□ Invertible
□ At least one negative eigenvalue
□ All the entries are positive
□ Symmetric
Question
Suppose ϕ(x) is an arbitrary feature mapping from input x ∈ X to ϕ(x) ∈ RN and let
K (x, z) = ϕ(x)⊤ ϕ(z). Then K (x, z) will always be a valid kernel function.
Circle one: True False
Question
Suppose ϕ(x) is the feature map induced by a polynomial kernel K (x, z) of degree d,
then ϕ(x) should be a d-dimensional vector.
Circle one: True False
Question
= x⊤ rev(y
Let x, y ∈ Rd be two points. Consider the function k(x, y ) )where rev(y )
1 3
reverses the order of the components in y . For example, rev 2 = 2. Show that
3 1
k cannot be a valid kernel function.
Question
Suppose we have a feature map Φ and a kernel function k(Xi , Xj ) = Φ(Xi ) · Φ(Xj ).
Select the true statements about kernels.
A: If there are n sample points of dimension d, it takes O(nd) time to compute
the kernel matrix.
⃝ B: The kernel trick implies we do not compute Φ(Xi ) explicitly for any sample
point Xi .
⃝ C: For every possible feature map Φ : Rd → RD you could imagine, there is a
way to compute k(Xi , Xj ) in O(d) time.
⃝ D: Running times of kernel algorithms do not depend on the dimension D of
the feature space Φ(·).
Thank you!
Questions?