0% found this document useful (0 votes)

133 views

Functionspaces PDF

1. A function space is a set of functions that has some structure, such as a Hilbert space or Lp space. These spaces are often used to restrict regression or classification functions in machine learning algorithms. 2. Hilbert spaces are complete inner product spaces that allow defining distances between functions. Common Hilbert spaces include L2 spaces of square-integrable functions. 3. Lp spaces define a p-norm on functions and contain functions whose p-norm is finite. Special properties make L2 Hilbert spaces which have useful properties like orthonormal bases and Parseval's identity.

Uploaded by

gabriele

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

133 views

Functionspaces PDF

Uploaded by

gabriele

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Function Spaces

A function space is a set of functions F that has some structure. Often a nonparametric
regression function or classifier is chosen to lie in some function space, where the assumed
structure is exploited by algorithms and theoretical analysis. Here we review some basic
facts about function spaces.

As motivation, consider nonparametric regression. We observe (X1 , Y1 ), . . . , (Xn , Yn ) and

we want to estimate m(x) = E(Y |X = x). We cannot simply choose m to minimize the
training errorP i (Yi − m(Xi ))2 as this will lead to interpolating the data. One approach is
P
to minimize i (Yi − m(Xi ))2 while restricting m to be in a well behaved function space.

1 Hilbert Spaces

Let V be a vector space. A norm is a mapping k · k : V → [0, ∞) that satisfies

1. kx + yk ≤ kxk + kyk.

2. kaxk = akxk for all a ∈ R.

3. kxk = 0 implies that x = 0.

pP
An example of a norm on V = Rk is the Euclidean norm kxk = 2
i xi . A sequence
x1 , x2 , . . . in a normed space is a Cauchy sequence if kxm − xn k → 0 as m, n → ∞. The
space is complete if every Cauchy sequence converges to a limit. A complete, normed space
is called a Banach space.

An inner product is a mapping h·, ·i : V × V → R that satisfies, for all x, y, z ∈ V and a ∈ R:

1. hx, xi ≥ 0 and hx, xi = 0 if and only if x = 0

2. hx, y + zi = hx, yi + hx, zi

3. hx, ayi = ahx, yi

4. hx, yi = hy, xi

An example of an inner product on V = Rk is hx, yi = i xi yi .pTwo vectors x and y are

orthogonal if hx, yi = 0. An inner product defines a norm kvk = hv, vi. We then have the
Cauchy-Schwartz inequality
|hx, yi| ≤ kxk kyk. (1)

1
A Hilbert space is a complete, inner product space. Every Hilbert space is a Banach space
but the reverse is not true in general. In a Hilbert space, we write fn → f to mean that
||fn − f || → 0 as n → ∞. Note that ||fn − f || → 0 does NOT imply that fn (x) → f (x). For
this to be true, we need the space to be a reproducing kernel Hilbert space which we discuss
later.

If V is a Hilbert space and L is a closed subspace then for any v ∈ V there is a unique
y ∈ L, called the projection of v onto L, which minimizes kv − zk over z ∈ L. The set of
elements orthogonal to every z ∈ L is denoted by L⊥ . Every v ∈ V can be written uniquely
as v = w + z where z is the projection of v onto L and w ∈ L⊥ . In general, if L and M are
subspaces such that every ` ∈ L is orthogonal to every m ∈ M then we define the orthogonal
sum (or direct sum) as
L ⊕ M = {` + m : ` ∈ L, m ∈ M }. (2)

A set of vectors {et , t ∈ T } is orthonormal if hes , et i = 0 when s 6= t and ket k = 1 for all t ∈ T .
If {et , t ∈ T } are orthonormal, and the only vector orthogonal to each et is the zero vector,
then {et , t ∈ T } is called an orthonormal basis. Every Hilbert space has an orthonormal
basis. A Hilbert space is separable if there exists a countable orthonormal basis.

Theorem 1 Let V be a separable Hilbert P∞ space with countable orthonormal basis {e1 , e2 , . . .}.
2
Then, for any x ∈ V , we have x = j=1 θj ej where θj = hx, ej i. Furthermore, kxk =
P∞ 2
j=1 θj , which is known as Parseval’s identity.

The coefficients θj = hx, ej i are called Fourier coefficients.

The set Rd with inner product hv, wi = j vj wj is a Hilbert space. Another example of a
P
Rb
Hilbert space
R is the set of functions f : [a, b] → R such that a f 2 (x)dx < ∞ with inner
product f (x)g(x)dx. This space is denoted by L2 (a, b).

2 Lp Spaces

Let F be a collection of functions taking [a, b] into R. The Lp norm on F is defined by

!1/p
Z b
kf kp = |f (x)|p dx (3)
a

where 0 < p < ∞. For p = ∞ we define

kf k∞ = sup |f (x)|. (4)

2
Sometimes we write kf k2 simply as kf k. The space Lp (a, b) is defined as follows:
( )
Lp (a, b) = f : [a, b] → R : kf kp < ∞ . (5)

Every Lp is a Banach space. Some useful inequalities are:

R 2 R R
Cauchy-Schwartz f (x)g(x)dx ≤ f 2 (x)dx g 2 (x)dx
Minkowski kf + gkp ≤ kf kp + kgkp where p > 1
Hölder kf gk1 ≤ kf kp kgkq where (1/p) + (1/q) = 1.

Special Properties of L2 . As we mentioned earlier, the space L2 (a, b) is a Hilbert space.

Rb
The inner product between two functions f and g in L2 (a, b) is a f (x)g(x)dx and the norm
Rb
of f is kf k2 = a f 2 (x) dx. With this inner product, L2 (a, b) is a separable Hilbert space.
Thus we can find a countable orthonormal basis φ1 , φ2 , . . .; that is, kφj k = 1 for all j,
Rb
φ (x) φj (x)dx = 0 for i 6= j and the only function that is orthogonal to each φj is the zero
a i
function. (In fact, there are many such bases.) It follows that if f ∈ L2 (a, b) then
∞
X
f (x) = θj φj (x) (6)
j=1

where Z b
θj = f (x) φj (x) dx (7)
a
are the coefficients. Also, recall Parseval’s identity
Z b ∞
X
2
f (x)dx = θj2 . (8)
a j=1

The set of functions ( n )

X
aj φj (x) : a1 , . . . , an ∈ R (9)
j=1

of {φ1 , . . . , φn }. The projection of f = ∞

P
is the called the span P j=1 θj φj (x) onto the span of
n
{φ1 , . . . , φn } is fn = j=1 θj φj (x). We call P fn the n-term linear approximation of f . Let
Λn denote all functions of the form g = ∞ j=1 aj φj (x) such that at most n of the aj ’s are
non-zero. Note that Λn is not a linear space, since if g1 , g2P∈ Λn it does not follow that g1 +g2
is in Λn . The best approximation to f in Λn is fn = j∈An θj φj (x) where An are the n
indices corresponding to the n largest |θj |’s. We call fn the n-term nonlinear approximation
of f .

3
The Fourier basis on [0, 1] is defined by setting φ1 (x) = 1 and
1 1
φ2j (x) = √ cos(2jπx), φ2j+1 (x) = √ sin(2jπx), j = 1, 2, . . . (10)
2 2
The cosine basis on [0, 1] is defined by
√
φ0 (x) = 1, φj (x) = 2 cos(2πjx), j = 1, 2, . . . . (11)

The Legendre basis on (−1, 1) is defined by

1 1
P0 (x) = 1, P1 (x) = x, P2 (x) = (3x2 − 1), P3 (x) = (5x3 − 3x), . . . (12)
2 2
These polynomials are defined by the relation
1 dn 2
Pn (x) = n n
(x − 1)n . (13)
2 n! dx
The Legendre polynomials are orthogonal but not orthonormal, since
Z 1
2
Pn2 (x)dx = . (14)
−1 2n + 1
p
However, we can define modified Legendre polynomials Qn (x) = (2n + 1)/2 Pn (x) which
then form an orthonormal basis for L2 (−1, 1).

The Haar basis on [0,1] consists of functions

( )
φ(x), ψjk (x) : j = 0, 1, . . . , k = 0, 1, . . . , 2j − 1 (15)

where
1 if 0 ≤ x < 1
φ(x) = (16)
0 otherwise,
ψjk (x) = 2j/2 ψ(2j x − k) and

−1 if 0 ≤ x ≤ 12

ψ(x) = (17)
1 if 12 < x ≤ 1.

This is a doubly indexed set of functions so when f is expanded in this basis we write
−1
∞ 2X j
X
f (x) = αφ(x) + βjk ψjk (x) (18)
j=1 k=1

R1 R1
where α = 0 f (x) φ(x) dx and βjk = 0 f (x) ψjk (x) dx. The Haar basis is an example of a
wavelet basis.

4
Let [a, b]d = [a, b] × · · · × [a, b] be the d-dimensional cube and define
( Z )
L2 [a, b]d = f : [a, b]d → R : f 2 (x1 , . . . , xd ) dx1 . . . dxd < ∞ .

(19)
[a,b]d

Suppose that B = {φ1 , φ2 , . . .} is an orthonormal basis for L2 ([a, b]). Then the set of functions
( )
Bd = B ⊗ · · · ⊗ B = φi1 (x1 ) φi2 (x2 ) · · · φid (xd ) : i1 , i2 , . . . , id ∈ {1, 2, . . . , } , (20)

is called the tensor product of B, and forms an orthonormal basis for L2 ([a, b]d ).

3 Hölder Spaces

Let β be a positive integer.1 Let T ⊂ R. The Holder space H(β, L) is the set of functions
g : T → R such that
|g (β−1) (y) − g (β−1) (x)| ≤ L|x − y|, for all x, y ∈ T. (21)
The special case β = 1 is sometimes called the Lipschitz space. If β = 2 then we have
|g 0 (x) − g 0 (y)| ≤ L |x − y|, for all x, y.
Roughly speaking, this means that the functions have bounded second derivatives.

There is also a multivariate version of Holder spaces. Let T ⊂ Rd . Given a vector s =

(s1 , . . . , sd ), define |s| = s1 + · · · + sd , s! = s1 ! · · · sd !, xs = xs11 · · · xsdd and
∂ s1 +···+sd
Ds = .
∂xs11 · · · ∂xsdd
The Hölder class H(β, L) is the set of functions g : T → R such that
|Ds g(x) − Ds g(y)| ≤ Lkx − ykβ−|s| (22)
for all x, y and all s such that |s| = β − 1.

If g ∈ H(β, L) then g(x) is close to its Taylor series approximation:

|g(u) − gx,β (u)| ≤ Lku − xkβ (23)
where
X (u − x)s
gx,β (u) = Ds g(x). (24)
s!
|s|≤bβc

1
It is possible to define Holder spaces for non-integers but we will not need this generalization.

5
In the case of β = 2, this means that

|g(u) − [g(x) + (x − u)T ∇g(x)]| ≤ L||x − u||2 .

We will see that in function estimation, the optimal rate of convergence over H(β, L) under
L2 loss is O(n−2β/(2β+d) ).

4 Sobolev Spaces

Let f be integrable on every bounded interval. Then f is weakly differentiable

Ry if there exists
a function f 0 that is integrable on every bounded interval, such that x f 0 (s)ds = f (y)−f (x)
whenever x ≤ y. We call f 0 the weak derivative of f . Let Dj f denote the j th weak derivative
of f .

The Sobolev space of order m is defined by

n o
m
Wm,p = f ∈ Lp (0, 1) : kD f k ∈ Lp (0, 1) . (25)

The Sobolev ball of order m and radius c is defined by

n o
m
Wm,p (c) = f : f ∈ Wm,p , kD f kp ≤ c . (26)

For the rest of this section we take p = 2 and write Wm instead of Wm,2

Theorem 2 The Sobolev space Wm is a Hilbert space under the inner product
m−1
X Z 1
(k) (k)
hf, gi = f (0)g (0) + f (k) (x)g (k) (x) dx. (27)
k=0 0

Define
m−1 x∧y
(x − u)m−1 (y − u)m−1
Z
X 1 k k
K(x, y) = x y + du. (28)
k=1
k! 0 (m − 1)!2
Then, for each f ∈ Wm we have
f (y) = hf, K(·, y)i (29)
and
K(x, y) = hK(·, x), K(·, y)i. (30)
We say that K is a kernel for the space and that Wm is a reproducing kernel Hilbert space
or RKHS. See Section 7 for more on reproducing kernel Hilbert spaces.

6
It follows from Mercer’s theorem (Theorem 4) that there is an orthonormal basis {e1 , e2 , . . . , }
for L2 (a, b) and real numbers λ1 , λ2 , . . . such that
∞
X
K(x, y) = λj ej (x) ej (y). (31)
j=1

The functions ej are eigenfunctions of K and the λj ’s are the corresponding eigenvalues,
Z
K(x, y) ej (y) dy = λj ej (x). (32)

Hence, the inner product defined in (27) can be written as

∞
X θj βj
hf, gi = (33)
j=0
λj
P∞ P∞
where f (x) = j=0 θj ej (x) and g(x) = j=0 βj ej (x).

Next we discuss how the functions in a Sobolev space can be parameterized by using another
convenient basis. An ellipsoid is a set of the form
( ∞
)
X
Θ= θ: a2j θj2 ≤ c2 (34)
j=1

where aj is a sequence of numbers such that aj → ∞ as j → ∞. If Θ is an ellipsoid and if

a2j ∼ (πj)2m as j → ∞, we call Θ a Sobolev ellipsoid and we denote it by Θm (c).

Theorem 3 Let {φj , j = 0, 1, . . .} be the Fourier basis:

1 1
φ1 (x) = 1, φ2j (x) = √ cos(2jπx), φ2j+1 (x) = √ sin(2jπx), j = 1, 2, . . . (35)
2 2
Then, ( ∞ ∞
)
X X
Wm (c) = f: f= θj φj , a2j θj2 ≤ c2 (36)
j=1 j=1

where aj = (πj)m for j even and aj = (π(j − 1))m for j odd. Thus, a Sobolev space
corresponds to a Sobolev ellipsoid with aj ∼ (πj)2m .

Note that (36) allows us to define the Sobolev space Wm for fractional values of m as well
as integer values. A multivariate version of Sobolev spaces can be defined as follows. Let
α = (α1 , . . . , αd ) be non-negative integers and define |α| = α1 + · · · + αd . Given x =
(x1 , . . . , xd ) ∈ Rd write xα = xα1 1 · · · xαd d and
∂ |α|
Dα = . (37)
∂xα1 1 · · · ∂xαd d

7
Then the Sobolev space is defined by
( )
[a, b]d : Dα f ∈ Lp ([a, b]d ) for all |α| ≤ m .

Wm,p = f ∈ Lp (38)

We will see that in function estimation, the optimal rate of convergence over Wβ,2 under L2
loss is O(n−2β/(2β+d) ).

5 Besov Spaces*

Functions in Sobolev spaces are homogeneous, meaning that their smoothness does not vary
substantially across the domain of the function. Besov spaces are richer classes of functions
that include inhomogeneous functions.

Let r
k r
(r)
X
∆h f (x) = (−1) f (x + kh). (39)
k=0
k
(0)
Thus, ∆h f (x) = f (x) and
(r) (r−1) (r−1)
∆h f (x) = ∆h f (x + h) − ∆h f (x). (40)

Next define
(r)
wr,p (f ; t) = sup k∆h f kp (41)
|h|≤t
R 1/p
where kgkp = |g(x)|p dx . Given (p, q, ς), let r be such that r − 1 ≤ ς ≤ r. The Besov
seminorm is defined by
Z ∞ 1/q
ς −ς q dh
kf kp,q = (h wr,p (f ; h)) . (42)
0 h
For q = ∞ we define
wr,p (f ; h)
kf kςp,∞ = sup . (43)
0<h<1 hς
ς
The Besov space Bp,q (c) is defined to be the set of functions f mapping [0, 1] into R such
that |f | < ∞ and kf kςp,q ≤ c.
R p

Besov spaces include a wide range of familiar function spaces. The Sobolev space Wm,2
m
corresponds to the Besov ball B2,2 . The generalized Sobolev space Wm,p which uses an Lp
th m m
norm on the m derivative is almost a Besov space in the sense that Bp,1 ⊂ Wp (m) ⊂ Bp,∞ .
k+β
The Hölder space Hα with α = k + β is equivalent to B∞,∞ , and the set T consisting of
1 1
functions of bounded variation satisfies B1,1 ⊂ T ⊂ B1,∞ .

8
6 Entropy and Dimension

Given a norm k · k on a function space F, a sphere of radius is a set of the form {f ∈

F : kf − gk ≤ } for some g. A set of spheres covers F if F is contained in their union.
The covering number N (, k · k) is the smallest number of spheres of radius required to
cover F. We drop the dependence on the norm k · k when it is understood from context.
The metric entropy of F is H() = log N (). The class F has dimension d if, for all small ,
N () = c(1/)d for some constant c.

A finite set {f1 , . . . , fk } is an -net if kfi − fj k > for all i =

6 j. The packing number M ()
is the size of the largest -net, and the packing entropy is V () = log M (). The packing
entropy and metric entropy are related by
M (2) ≤ H() ≤ M (). (44)

Here are some common spaces and their entropies:

Space H()
Sobolev Wm,p −d/m
Besov Bpqς
−d/ς
Hölder Hα −d/α

7 Mercer Kernels and Reproducing Kernel Hilbert Spaces

Intuitively, a reproducing kernel Hilbert space (RKHS) is a class of smooth functions defined
by an object called a Mercer kernel. Here are the details.

Mercer Kernels. A Mercer kernel is a continuous function K : [a, b] × [a, b] → R such that
K(x, y) = K(y, x), and such that K is positive semidefinite, meaning that
n X
X n
K(xi , xj )ci cj ≥ 0 (45)
i=1 j=1

for all finite sets of points x1 , . . . , xn ∈ [a, b] and all real numbers c1 , . . . , cn . The function
m−1 Z x∧y
X 1
k k (x − u)m−1 (y − u)m−1
K(x, y) = x y + 2
du (46)
k=1
k! 0 (m − 1)!
introduced in the Section 4 on Sobolev spaces is an example of a Mercer kernel. The most
commonly used kernel is the Gaussian kernel
||x−y||2
K(x, y) = e− σ2 .

9
Theorem 4 (Mercer’s theorem) Suppose that K : X ×X → R is symmetric and satisfies
supx,y K(x, y) < ∞, and define
Z
TK f (x) = K(x, y) f (y) dy (47)
X
2 2
suppose that Tk : L (X ) → L (X) is positive semidefinite; thus,
Z Z
K(x, y) f (x) f (y) dx dy ≥ 0 (48)
X X

for any f ∈ L2 (X ). Let λi , Ψi be the eigenfunctions and eigenvectors of TK , with

Z
K(x, y)Ψi (y) dy = λi Ψi (x). (49)
X
P
Then i λi < ∞, supx Ψi (x) < ∞, and
∞
X
K(x, y) = λi Ψi (x) Ψj (y), (50)
i=1

where the convergence is uniform in x, y.

This gives the mapping into feature space as

p p >
x 7→ Φ(x) = λ1 Ψ1 (x), λ2 Ψ2 (x), . . . (51)

The positive semidefinite requirement for Mercer kernels is generally difficult to verify. But
the following basic results show how one can build up kernels in pieces.

If K1 : X × X → R and K2 : X × X → R are Mercer kernels then so are the following:

K(x, y) = K1 (x, y) + K2 (x, y) (52)
K(x, y) = c K1 (x, y) + K2 (x, y) for c ∈ R+ (53)
K(x, y) = K1 (x, y) + c for c ∈ R+ (54)
K(x, y) = K1 (x, y) K2 (x, y) (55)
K(x, y) = f (x) f (y) for f : X −→ R (56)
K(x, y) = (K1 (x, y) + c)d for θ1 ∈ R+ and d ∈ N (57)
K(x, y) = exp K1 (x, y)/σ 2

for σ ∈ R (58)
2

K(x, y) = exp −(K1 (x, x) − 2K1 (x, y) + K1 (y, y))/2σ (59)
p
K(x, y) = K1 (x, y)/ K1 (x, x) K1 (y, y) (60)

10
RKHS. Given a kernel K, let Kx (·) be the function obtained by fixing the first coordinate.
That is, Kx (y) = K(x, y). For the Gaussian kernel, Kx is a Normal, centered at x. We can
create functions by taking liner combinations of the kernel:
k
X
f (x) = αj Kxj (x).
j=1

Let H0 denote all such functions:

( k
)
X
H0 = f: αj Kxj (x) .
j=1

Given two such functions f (x) = kj=1 αj Kxj (x) and g(x) = m
P P
j=1 βj Kyj (x) we define an
inner product XX
hf, gi = hf, giK = αi βj K(xi , yj ).
i j

In general, f (and g) might be representable in more than one way. You can check that
hf, giK is independent of how f (or g) is represented. The inner product defines a norm:
p sX X √
||f ||K = hf, f, i = αj αk K(xj , xk ) = αT Kα
j k

where α = (α1 , . . . , αk )T and K is the k × k matrix with Kjk = K(xj , xk ).

P
The Reproducing Property. Let f (x) = i αi Kxi (x). Note the following crucial prop-
erty: X
hf, Kx i = αi K(xi , x) = f (x).
i

This follows from the definition of hf, gi where we take g = Kx . This implies that

hKx , Kx i = K(x, x).

This is called the reproducing property. It also implies that Kx is the representer of the
evaluation functional.

The completion of H0 with respect to || · ||K is denoted by HK and is called the

RKHS generated by K.

To verify that this is a well-defined Hilbert space, you should check that the following
properties hold:

hf, gi = hg, f i
hcf + dg, hi = chf, hi + chg, hi
hf, f i = 0 iff f = 0.

11
The last one is not obvious so let us verify it here. It is easy to see that f = 0 implies that
hf, f i = 0. Now we must show that hf, f i = 0 implies that f (x) = 0. So suppose that
hf, f i = 0. Pick any x. Then

0 ≤ f 2 (x) = hf, Kx i2 = hf, Kx i hf, Kx i

≤ ||f ||2 ||Kx ||2 = hf, f i2 ||Kx ||2 = 0

where we used Cauchy-Schwartz. So 0 ≤ f 2 (x) ≤ 0 which means that f (x) = 0.

Evaluation Functionals. A key property of RKHS’s is the behavior of the evaluation

functional. The evaluation functional δx assigns a real number to each function. It is defined
by δx f = f (x). In general, the evaluation functional is not continuous. This means we
can have√fn → f but δx fn does not converge √to δx f . For example,√let f (x) = 0 and
fn (x) = nI(x < 1/n2 ). Then ||fn − f || = 1/ n → 0. But δ0 fn = n which does not
converge to δ0 f = 0. Intuitively, this is because Hilbert spaces can contain very unsmooth
functions.

But in an RKHS, the evaluation functional is continuous. Intuitively, this means that the
functions in the space are well-behaved. To see this, suppose that fn → f . Then

δx fn = hfn Kx i → hf Kx i = f (x) = δx f

so the evaluation functional is continuous. In fact:

A Hilbert space is a RKHS if and only if the evaluation functionals are

continuous.

Examples. Here are some examples of RKHS’s.

Example 5 Let H be all functions f on R such that the support of the Fourier transform
of f is contained in [−a, a]. Then

sin(a(y − x))
K(x, y) =
a(y − x)

and Z
hf, gi = f g.

Example 6 Let H be all functions f on (0, 1) such that

Z 1
(f 2 (x) + (f 0 (x))2 )x2 dx < ∞.
0

12
Then
K(x, y) = (xy)−1 e−x sinh(y)I(0 < x ≤ y) + e−y sinh(x)I(0 < y ≤ x)

and Z 1
2
||f || = (f 2 (x) + (f 0 (x))2 )x2 dx.
0

Example
R 7 The Sobolev space of order m is (roughly speaking) the set of functions f such
that (f (m) )2 < ∞. For m = 1 and X = [0, 1] the kernel is
( 2 3
1 + xy + xy2 − y6 0 ≤ y ≤ x ≤ 1
K(x, y) = 2 3
1 + xy + yx2 − x6 0 ≤ x ≤ y ≤ 1

and Z 1
0
||f ||2K 2
= f (0) + f (0) + 2
(f 00 (x))2 dx.
0

Spectral Representation. Suppose that supx,y K(x, y) < ∞. Define eigenvalues λj and
orthonormal eigenfunctions ψj by
Z
K(x, y)ψj (y)dy = λj ψj (x).
P
Then j λj < ∞ and supx |ψj (x)| < ∞. Also,
∞
X
K(x, y) = λj ψj (x)ψj (y).
j=1

Define the feature map Φ by

p p
Φ(x) = ( λ1 ψ1 (x), λ2 ψ2 (x), . . .).

We can expand f either in terms of K or in terms of the basis ψ1 , ψ2 , . . .:

X ∞
X
f (x) = αi K(xi , x) = βj ψj (x).
i j=1
P P
Furthermore, if f (x) = j aj ψj (x) and g(x) = j bj ψj (x), then
∞
X aj b j
hf, gi = .
j=1
λj

Roughly speaking, when ||f ||K is small, then f is smooth.

13
Representer Theorem. Let ` be a loss function depending on (X1 , Y1 ), . . . , (Xn , Yn ) and
on f (X1 ), . . . , f (Xn ). Let fb minimize

` + g(||f ||2K )

where g is any monotone increasing function. Then fb has the form

n
X
fb(x) = αi K(xi , x)
i=1

for some α1 , . . . , αn .

RKHS Regression. Define m

b to minimize
X
R= (Yi − m(Xi ))2 + λ||m||2K .
i
Pn
By the representer theorem, m(x)
b = i=1 αi K(xi , x). Plug this into R and we get

R = ||Y − Kα||2 + λαT Kα

where Kjk = K(Xj , Xk ) is the Gram matrix. The minimizer over α is

b = (K + λI)−1 Y
α
P
and m(x)
b = j α
bj K(Xi , x). The fitted values are

α = K(K + λI)−1 Y = LY.

Yb = Kb

So this is a linear smoother. We will discuss this in detail later.

Support Vector Machines. Suppose Yi ∈ {−1, +1}. Recall the the linear SVM minimizes
the penalized hinge loss:
X λ
J= [1 − Yi (β0 + β T Xi )]+ + ||β||22 .
i
2

The dual is to maximize X 1X

αi − αi αj Yi Yj hXi , Xj i
i
2 i,j

subject to 0 ≤ αi ≤ C.

The RKHS version is to minimize

X λ
J= [1 − Yi f (Xi )]+ + ||f ||2K .
i
2

14
The dual is the same except that hXi , Xj i is replaced with K(Xi , Xj ). This is called the
kernel trick.

The Kernel Trick. This is a fairly general trick. In many algorithms you can replace hxi , xj i
with K(xi , xj ) and get a nonlinear version of the algorithm. This is equivalent to replacing
x with Φ(x) and replacing hxi , xj i with hΦ(xi ), Φ(xj )i. However, K(xi , xj ) = hΦ(xi ), Φ(xj )i
and K(xi , xj ) is much easier to compute.

In summary, by replacing hxi , xj i with K(xi , xj ) we turn a linear procedure into a nonlinear
procedure without adding much computation.

Hidden Tuning Parameters. There are hidden tuning parameters in the RKHS. Consider
the Gaussian kernel
||x−y||2
K(x, y) = e− σ2 .
For nonparametric regression we minimize i (Yi − m(Xi ))2 subject to ||m||K ≤ L. We
P
control the bias variance tradeoff by doing cross-validation over L. But what about σ?

This parameter seems to get mostly ignored. Suppose we have a uniform distribution on a
circle. The eigenfunctions of K(x, y) are the sines and cosines. The eigenvalues λk die off
like (1/σ)2k . So σ affects the bias-variance tradeoff since it weights things towards lower
order Fourier functions. In principle we can compensate for this by varying L. But clearly
there is some interaction between L and σ. The practical effect is not well understood.

Now consider the polynomial kernel K(x, y) = (1 + hx, yi)d . This kernel has the same
eigenfunctions but the eigenvalues decay at a polynomial rate depending on d. So there is
an interaction between L, d and, the choice of kernel itself.

Grade 7 Paper 1
83% (6)
Grade 7 Paper 1
12 pages
Ch3 S
No ratings yet
Ch3 S
16 pages
Study Guide CETs
No ratings yet
Study Guide CETs
8 pages
Boundary Element and Finite Element Methods
No ratings yet
Boundary Element and Finite Element Methods
73 pages
2.3 Perimeter Varios Shapes Measure and Calculate - Compound Shape Activity Sheet 3
100% (1)
2.3 Perimeter Varios Shapes Measure and Calculate - Compound Shape Activity Sheet 3
8 pages
MCQ Measures of Central Tendency With Correct Answers PDF
90% (10)
MCQ Measures of Central Tendency With Correct Answers PDF
10 pages
Shapes Worksheets For Preschool and Kindergarten
100% (7)
Shapes Worksheets For Preschool and Kindergarten
8 pages
BulPriSA Math
No ratings yet
BulPriSA Math
6 pages
Solution of Higher Order Partial Differential Equation by Using Homotopy Analysis Method
No ratings yet
Solution of Higher Order Partial Differential Equation by Using Homotopy Analysis Method
5 pages
Models - Ssf.forchheimer Flow
No ratings yet
Models - Ssf.forchheimer Flow
12 pages
Ch. 4 Roundoff and Truncation Errors
No ratings yet
Ch. 4 Roundoff and Truncation Errors
16 pages
On The Analysis of The Finite Element Solutions of Boundary Value Problems Using Gale Kin Method
No ratings yet
On The Analysis of The Finite Element Solutions of Boundary Value Problems Using Gale Kin Method
5 pages
Fem 2
No ratings yet
Fem 2
26 pages
Functional Spaces: Architectural Design - V
No ratings yet
Functional Spaces: Architectural Design - V
6 pages
FEM Notes 2016 PDF
No ratings yet
FEM Notes 2016 PDF
71 pages
Lecture No. 04: Classification of Second Order PDE (A Topic of Unit II)
No ratings yet
Lecture No. 04: Classification of Second Order PDE (A Topic of Unit II)
19 pages
R.B.V.R.R Women'S College (Autonomous) Department of Mathematics M.Sc. Semester III Paper-III Syllabus 2017-18
No ratings yet
R.B.V.R.R Women'S College (Autonomous) Department of Mathematics M.Sc. Semester III Paper-III Syllabus 2017-18
2 pages
Finite Difference Methods by LONG CHEN
No ratings yet
Finite Difference Methods by LONG CHEN
7 pages
Lax-Milgram Lemma and Applications
No ratings yet
Lax-Milgram Lemma and Applications
6 pages
Lax Milgram Theorem - Universität Basel
No ratings yet
Lax Milgram Theorem - Universität Basel
23 pages
03 Truncation Errors
100% (1)
03 Truncation Errors
37 pages
Weighted Residual Method - FEM
No ratings yet
Weighted Residual Method - FEM
37 pages
ME6603 Finite Element Analysis
No ratings yet
ME6603 Finite Element Analysis
21 pages
Introduction To Fractional Calculus Amna Al - Amri Project October 2010
No ratings yet
Introduction To Fractional Calculus Amna Al - Amri Project October 2010
29 pages
Assignment #5: Advanced Engineering Mathematics II Fall 1400
0% (1)
Assignment #5: Advanced Engineering Mathematics II Fall 1400
4 pages
Numerical Solutions of Second Order Boundary Value Problems by Galerkin Residual Method On Using Legendre Polynomials
No ratings yet
Numerical Solutions of Second Order Boundary Value Problems by Galerkin Residual Method On Using Legendre Polynomials
11 pages
Module 2 Vector Spaces Fundamentals
No ratings yet
Module 2 Vector Spaces Fundamentals
33 pages
Riesz Theorem
No ratings yet
Riesz Theorem
24 pages
Numericalsolutionof Differentialequations: 13:40:38, Subject To The Cambridge Core
No ratings yet
Numericalsolutionof Differentialequations: 13:40:38, Subject To The Cambridge Core
303 pages
Laplace Equation
100% (1)
Laplace Equation
4 pages
Advanced Numerical Methods
No ratings yet
Advanced Numerical Methods
160 pages
Numerical Solution of Partial Differential Equations Solution Manual
100% (1)
Numerical Solution of Partial Differential Equations Solution Manual
84 pages
Functional Analysis
No ratings yet
Functional Analysis
8 pages
Finite Difference Approximations
No ratings yet
Finite Difference Approximations
11 pages
Finite Difference Methods
No ratings yet
Finite Difference Methods
9 pages
Comp Heat Transfer4 Repaired)
No ratings yet
Comp Heat Transfer4 Repaired)
48 pages
CFD Notes 1-1
No ratings yet
CFD Notes 1-1
26 pages
Parabolic Equations: 9.1 One - Dimensional Unsteady Diffusion With Homogenous Boundary Conditions
No ratings yet
Parabolic Equations: 9.1 One - Dimensional Unsteady Diffusion With Homogenous Boundary Conditions
9 pages
The Gauss Curvature
No ratings yet
The Gauss Curvature
8 pages
Duhamel's Principle
No ratings yet
Duhamel's Principle
14 pages
VOLTERRA INTEGRAL EQUATIONS .Ru
No ratings yet
VOLTERRA INTEGRAL EQUATIONS .Ru
15 pages
093 - MA8353 Transforms and Partial Differential Equations - Notes
No ratings yet
093 - MA8353 Transforms and Partial Differential Equations - Notes
73 pages
Fractional Calculus PDF
No ratings yet
Fractional Calculus PDF
7 pages
Weak Convergence
No ratings yet
Weak Convergence
7 pages
Wang2004 A New Algorithm For Solving Classical Blasius Equation Lei Wang
No ratings yet
Wang2004 A New Algorithm For Solving Classical Blasius Equation Lei Wang
9 pages
Math 202B Solutions: Assignment 7 D. Sarason
No ratings yet
Math 202B Solutions: Assignment 7 D. Sarason
2 pages
23 Equivalent Norms
No ratings yet
23 Equivalent Norms
4 pages
Weighted Residual Formulations
No ratings yet
Weighted Residual Formulations
5 pages
09.successive Differentiation
No ratings yet
09.successive Differentiation
13 pages
Solutions To Partial Differential Equations by Lawrence Evans
No ratings yet
Solutions To Partial Differential Equations by Lawrence Evans
66 pages
Comparison Tests: Series Examples - Part II
No ratings yet
Comparison Tests: Series Examples - Part II
12 pages
Explicit and Implicit
No ratings yet
Explicit and Implicit
14 pages
An Elementary Treatise On Differential E
No ratings yet
An Elementary Treatise On Differential E
301 pages
Deformation PDF
No ratings yet
Deformation PDF
66 pages
Partial Differential Equations of Applied Mathematics Lecture Notes, Math 713 Fall, 2003
No ratings yet
Partial Differential Equations of Applied Mathematics Lecture Notes, Math 713 Fall, 2003
128 pages
Exercise 1 2022
No ratings yet
Exercise 1 2022
3 pages
Finite Element Analysis of Structural Systems
No ratings yet
Finite Element Analysis of Structural Systems
255 pages
Hopf-Bifurcation Ina Two Dimensional Nonlinear Differential Equation
100% (1)
Hopf-Bifurcation Ina Two Dimensional Nonlinear Differential Equation
11 pages
Vector and Matrix Norm
No ratings yet
Vector and Matrix Norm
17 pages
QED ICTP Note0
No ratings yet
QED ICTP Note0
113 pages
Finite Element Method MV - 6251: by Cap. Dr. Riessom W/Giorgis
No ratings yet
Finite Element Method MV - 6251: by Cap. Dr. Riessom W/Giorgis
32 pages
Introductory Applications of Partial Differential Equations: With Emphasis on Wave Propagation and Diffusion
From Everand
Introductory Applications of Partial Differential Equations: With Emphasis on Wave Propagation and Diffusion
G. L. Lamb, Jr.
No ratings yet
Ph.D. Qualifying Exam, Real Analysis Fall 2009,: J J 1 N 1 N N 1 N
No ratings yet
Ph.D. Qualifying Exam, Real Analysis Fall 2009,: J J 1 N 1 N N 1 N
2 pages
Exercices Kernel Trick
No ratings yet
Exercices Kernel Trick
24 pages
Ph.D. Qualifying Exam, Real Analysis Spring 2009,: P P 1 X X 0
No ratings yet
Ph.D. Qualifying Exam, Real Analysis Spring 2009,: P P 1 X X 0
2 pages
Adjoint Hilbert
No ratings yet
Adjoint Hilbert
8 pages
Humboldt Wi Github Io Blog Research Information - Systems - 1920 Group2 - Survivalanalysis
No ratings yet
Humboldt Wi Github Io Blog Research Information - Systems - 1920 Group2 - Survivalanalysis
20 pages
Golden Rules To Answer in A System Design Interview
100% (1)
Golden Rules To Answer in A System Design Interview
33 pages
WEIGHT (KG) Body Fat (%) Total Body Water TBW (%) Muscle Mass (%) Bone Mass (KG)
No ratings yet
WEIGHT (KG) Body Fat (%) Total Body Water TBW (%) Muscle Mass (%) Bone Mass (KG)
2 pages
Stats 231 / CS229T Homework 3 Solutions
No ratings yet
Stats 231 / CS229T Homework 3 Solutions
6 pages
Calendar AY 1617 Letter
No ratings yet
Calendar AY 1617 Letter
1 page
Jeopardy Powerpoint
No ratings yet
Jeopardy Powerpoint
45 pages
Grade 8
No ratings yet
Grade 8
7 pages
Ch.4 Coordinate Geometry
No ratings yet
Ch.4 Coordinate Geometry
31 pages
Quant Master Session - TGs Geometry II For CAT-12 PDF
No ratings yet
Quant Master Session - TGs Geometry II For CAT-12 PDF
2 pages
Homework 2 Projectile Motion Ans Key
No ratings yet
Homework 2 Projectile Motion Ans Key
5 pages
Questions
No ratings yet
Questions
6 pages
20 Best Applied Mathematics Books of All Time
No ratings yet
20 Best Applied Mathematics Books of All Time
141 pages
Math 094 Final Exam Review-Compass
No ratings yet
Math 094 Final Exam Review-Compass
8 pages
Textbook 2 5
33% (3)
Textbook 2 5
6 pages
Lovely Professional University: Declaration
No ratings yet
Lovely Professional University: Declaration
5 pages
CHSD230 Math 1 Honors Prologue
No ratings yet
CHSD230 Math 1 Honors Prologue
21 pages
Wu 2019
No ratings yet
Wu 2019
16 pages
Javaprograms
No ratings yet
Javaprograms
5 pages
Lecture Notes W2
No ratings yet
Lecture Notes W2
71 pages
AMOS QP Re
No ratings yet
AMOS QP Re
2 pages
CH 8 - Trigonometry Class X
No ratings yet
CH 8 - Trigonometry Class X
2 pages
Introduction To Calculus (Preliminary Study)
No ratings yet
Introduction To Calculus (Preliminary Study)
25 pages
Orthogonal Complements: Linear Algebra Orthogonality
No ratings yet
Orthogonal Complements: Linear Algebra Orthogonality
6 pages
Physics Reviewer 1
100% (2)
Physics Reviewer 1
4 pages
Topic 7.2 - Verifying Solutions To Differential Equations
No ratings yet
Topic 7.2 - Verifying Solutions To Differential Equations
3 pages
Trigonometric Identities
No ratings yet
Trigonometric Identities
15 pages
Strength of Materials Formulas
No ratings yet
Strength of Materials Formulas
4 pages
Investigating (Polygons)
No ratings yet
Investigating (Polygons)
2 pages
Math-1 - Exercises - Mid - Fall 23-24
No ratings yet
Math-1 - Exercises - Mid - Fall 23-24
2 pages