Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
35 views

Advanced Machine Learning

The document discusses function approximation, which is approximating an unknown function f(x) using a simpler function. It presents methods for approximating a function on an interval using polynomials of varying degrees. Specifically, it shows how to construct a unique second-degree interpolation polynomial that matches the value of f(x) at three given points. More generally, it describes approximating a function using a weighted sum of simpler basis functions. The goal is for the approximation to be arbitrarily close to the true function over the domain.

Uploaded by

parth2024
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views

Advanced Machine Learning

The document discusses function approximation, which is approximating an unknown function f(x) using a simpler function. It presents methods for approximating a function on an interval using polynomials of varying degrees. Specifically, it shows how to construct a unique second-degree interpolation polynomial that matches the value of f(x) at three given points. More generally, it describes approximating a function using a weighted sum of simpler basis functions. The goal is for the approximation to be arbitrarily close to the true function over the domain.

Uploaded by

parth2024
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 74

Approximation of Functions

Ravi Kothari, Ph.D.


ravi.kothari@ashoka.edu.in

I think that it is a relatively good approximation to truth  which is much too complicated
to allow anything but approximations  that mathematical ideas originate in empirics.  -
John von Neumann

Advanced Machine Learning 1 / 20


Approximation of Functions

Advanced Machine Learning 2 / 20


Approximation of Functions

Let y = f (x) be given on the interval [x0 , x2 ]. Let x1 be a point such


that x0 < x1 < x2

Advanced Machine Learning 2 / 20


Approximation of Functions

Let y = f (x) be given on the interval [x0 , x2 ]. Let x1 be a point such


that x0 < x1 < x2
y0 = f (x0 ), y1 = f (x1 ), y2 = f (x2 )

Advanced Machine Learning 2 / 20


Approximation of Functions

Let y = f (x) be given on the interval [x0 , x2 ]. Let x1 be a point such


that x0 < x1 < x2
y0 = f (x0 ), y1 = f (x1 ), y2 = f (x2 )
Let us say we want to approximate f (x). A polynomial of second
degree seems appropriate to approximate f (x) i.e.,
P(x) = a0 + a1 x + a2 x 2 (1)

Advanced Machine Learning 2 / 20


Approximation of Functions

Let y = f (x) be given on the interval [x0 , x2 ]. Let x1 be a point such


that x0 < x1 < x2
y0 = f (x0 ), y1 = f (x1 ), y2 = f (x2 )
Let us say we want to approximate f (x). A polynomial of second
degree seems appropriate to approximate f (x) i.e.,
P(x) = a0 + a1 x + a2 x 2 (1)

We want to nd the coecients of P(x) such that P(x0 ) = y0 ,


P(x1 ) = y1 , and P(x2 ) = y2

Advanced Machine Learning 2 / 20


Approximation of Functions

Let y = f (x) be given on the interval [x0 , x2 ]. Let x1 be a point such


that x0 < x1 < x2
y0 = f (x0 ), y1 = f (x1 ), y2 = f (x2 )
Let us say we want to approximate f (x). A polynomial of second
degree seems appropriate to approximate f (x) i.e.,
P(x) = a0 + a1 x + a2 x 2 (1)

We want to nd the coecients of P(x) such that P(x0 ) = y0 ,


P(x1 ) = y1 , and P(x2 ) = y2
We can of course solve it exactly since there are 3 equations and 3
unknowns

Advanced Machine Learning 2 / 20


Advanced Machine Learning 3 / 20
Let us approach it dierently and construct a polynomial Q0 (x) of
second degree such that Q0 (x0 ) = 1, Q0 (x1 ) = 0, Q0 (x2 ) = 0.
Likewise Q1 (x0 ) = 0, Q1 (x1 ) = 1, Q1 (x2 ) = 0, and Q2 (x0 ) = 0,
Q2 (x1 ) = 0, and Q2 (x2 ) = 1

Advanced Machine Learning 3 / 20


Let us approach it dierently and construct a polynomial Q0 (x) of
second degree such that Q0 (x0 ) = 1, Q0 (x1 ) = 0, Q0 (x2 ) = 0.
Likewise Q1 (x0 ) = 0, Q1 (x1 ) = 1, Q1 (x2 ) = 0, and Q2 (x0 ) = 0,
Q2 (x1 ) = 0, and Q2 (x2 ) = 1
The desired polynomials are,
(x − x1 )(x − x2 )
Q0 (x) =
(x0 − x1 )(x0 − x2 )
(x − x0 )(x − x2 )
Q1 (x) =
(x1 − x0 )(x1 − x2 )
(x − x0 )(x − x1 )
Q2 (x) = (2)
(x2 − x0 )(x2 − x1 )

Advanced Machine Learning 3 / 20


Let us approach it dierently and construct a polynomial Q0 (x) of
second degree such that Q0 (x0 ) = 1, Q0 (x1 ) = 0, Q0 (x2 ) = 0.
Likewise Q1 (x0 ) = 0, Q1 (x1 ) = 1, Q1 (x2 ) = 0, and Q2 (x0 ) = 0,
Q2 (x1 ) = 0, and Q2 (x2 ) = 1
The desired polynomials are,
(x − x1 )(x − x2 )
Q0 (x) =
(x0 − x1 )(x0 − x2 )
(x − x0 )(x − x2 )
Q1 (x) =
(x1 − x0 )(x1 − x2 )
(x − x0 )(x − x1 )
Q2 (x) = (2)
(x2 − x0 )(x2 − x1 )

The interpolating polynomial is then,


P(x) = y0 Q0 (x) + y1 Q1 (x) + y2 Q2 (x) (3)

Advanced Machine Learning 3 / 20


Advanced Machine Learning 4 / 20
This interpolation polynomial of degree 2 is unique

Advanced Machine Learning 4 / 20


This interpolation polynomial of degree 2 is unique
Proof:

Advanced Machine Learning 4 / 20


This interpolation polynomial of degree 2 is unique
Proof:
I Let P1 (x) be another polynomial. Then, P(x) = P1 (x) for x = x0 ,
x = x1 , x = x2

Advanced Machine Learning 4 / 20


This interpolation polynomial of degree 2 is unique
Proof:
I Let P1 (x) be another polynomial. Then, P(x) = P1 (x) for x = x0 ,
x = x1 , x = x2
I So, P(x) − P1 (x) vanishes at three values of x

Advanced Machine Learning 4 / 20


This interpolation polynomial of degree 2 is unique
Proof:
I Let P1 (x) be another polynomial. Then, P(x) = P1 (x) for x = x0 ,
x = x1 , x = x2
I So, P(x) − P1 (x) vanishes at three values of x
I Hence P(x) − P1 (x) must be 0 i.e. P(x) = P1 (x)

Advanced Machine Learning 4 / 20


This interpolation polynomial of degree 2 is unique
Proof:
I Let P1 (x) be another polynomial. Then, P(x) = P1 (x) for x = x0 ,
x = x1 , x = x2
I So, P(x) − P1 (x) vanishes at three values of x
I Hence P(x) − P1 (x) must be 0 i.e. P(x) = P1 (x)
Obviously, the polynomial P(x) diers from f (x) at values of x other
than x0 , x1 , and x2

Advanced Machine Learning 4 / 20


Advanced Machine Learning 5 / 20
In a general case,
n
X (x − x0 )(x − x1 ) . . . (x − xk−1 )(x − xk+1 ) . . . (x − xn )
Pn (x) = f (xk )
(xk − x0 )(xk − x1 ) . . . (xk − xk−1 )(xk − xk+1 ) . . . (xk − xn )
k=0
(4)

Advanced Machine Learning 5 / 20


In a general case,
n
X (x − x0 )(x − x1 ) . . . (x − xk−1 )(x − xk+1 ) . . . (x − xn )
Pn (x) = f (xk )
(xk − x0 )(xk − x1 ) . . . (xk − xk−1 )(xk − xk+1 ) . . . (xk − xn )
k=0
(4)

Weierstrass proved that polynomials can approximate arbitrary well


any continuous real function in an interval

Advanced Machine Learning 5 / 20


In a general case,
n
X (x − x0 )(x − x1 ) . . . (x − xk−1 )(x − xk+1 ) . . . (x − xn )
Pn (x) = f (xk )
(xk − x0 )(xk − x1 ) . . . (xk − xk−1 )(xk − xk+1 ) . . . (xk − xn )
k=0
(4)

Weierstrass proved that polynomials can approximate arbitrary well


any continuous real function in an interval
Equation (6) corresponds to approximation of f (x) using a
superposition of simpler functions

Advanced Machine Learning 5 / 20


Generalizing

Advanced Machine Learning 6 / 20


Generalizing
Let f (x) be a real function of a real valued vector x = [x1 x2 . . . , xn ]T
that is square integrable over the real numbers

Advanced Machine Learning 6 / 20


Generalizing
Let f (x) be a real function of a real valued vector x = [x1 x2 . . . , xn ]T
that is square integrable over the real numbers
The goal of function approximation is to describe the behavior of f (x)
in a compact area S of the input space using a superposition of
simpler functions φi (x, w ) i.e.,

(5)
X
fˆ(φ(x, w ), W ) = Wi φi (x, w )
i=1

where, Wi 's are real valued constants such that,


|f (x) − fˆ(φ(x, w ), W )| <  (6)
and  can be arbitrarily small

Advanced Machine Learning 6 / 20


Generalizing
Let f (x) be a real function of a real valued vector x = [x1 x2 . . . , xn ]T
that is square integrable over the real numbers
The goal of function approximation is to describe the behavior of f (x)
in a compact area S of the input space using a superposition of
simpler functions φi (x, w ) i.e.,

(5)
X
fˆ(φ(x, w ), W ) = Wi φi (x, w )
i=1

where, Wi 's are real valued constants such that,


|f (x) − fˆ(φ(x, w ), W )| <  (6)
and  can be arbitrarily small
So we obtain the value of x ∈ S based on the combination of simpler
or elementary functions {φi (x, w )}
Advanced Machine Learning 6 / 20
Advanced Machine Learning 7 / 20
Advanced Machine Learning 7 / 20
Generalizing

Advanced Machine Learning 8 / 20


Generalizing
There are many possible choices for {φi (x, w )}. The polynomial we
saw before is one possibility

Advanced Machine Learning 8 / 20


Generalizing
There are many possible choices for {φi (x, w )}. The polynomial we
saw before is one possibility
We would prefer a set of {φi (x, w )} over another set if it provides a
smaller error for a given h or is computationally more ecient

Advanced Machine Learning 8 / 20


Generalizing
There are many possible choices for {φi (x, w )}. The polynomial we
saw before is one possibility
We would prefer a set of {φi (x, w )} over another set if it provides a
smaller error for a given h or is computationally more ecient
As a side note, observe that if we make the number of inputs equal to
the number of elementary functions {φi (x, w )}ñi=1 , then,
φ1 (x (1) , w ) φ2 (x (1) , w ) . . . φñ (x (1) , w )
     (1) 
W1 f (x )
φ1 (x (2) , w ) φ2 (x (2) , w ) . . . φñ (x (2) , w ) W2  f (x (2) )
   =  
 ...  . . .  . . . 
φ1 (x (h) , w ) φ2 (x (h) , w ) . . . φñ (x (h) , w ) Wñ f (x (ñ) )
(7)

Advanced Machine Learning 8 / 20


Generalizing
There are many possible choices for {φi (x, w )}. The polynomial we
saw before is one possibility
We would prefer a set of {φi (x, w )} over another set if it provides a
smaller error for a given h or is computationally more ecient
As a side note, observe that if we make the number of inputs equal to
the number of elementary functions {φi (x, w )}ñi=1 , then,
φ1 (x (1) , w ) φ2 (x (1) , w ) . . . φñ (x (1) , w )
     (1) 
W1 f (x )
φ1 (x (2) , w ) φ2 (x (2) , w ) . . . φñ (x (2) , w ) W2  f (x (2) )
   =  
 ...  . . .  . . . 
φ1 (x (h) , w ) φ2 (x (h) , w ) . . . φñ (x (h) , w ) Wñ f (x (ñ) )
(7)
I ..and W = φ−1 f (assuming the inverse exists!)

Advanced Machine Learning 8 / 20


Generalizing
There are many possible choices for {φi (x, w )}. The polynomial we
saw before is one possibility
We would prefer a set of {φi (x, w )} over another set if it provides a
smaller error for a given h or is computationally more ecient
As a side note, observe that if we make the number of inputs equal to
the number of elementary functions {φi (x, w )}ñi=1 , then,
φ1 (x (1) , w ) φ2 (x (1) , w ) . . . φñ (x (1) , w )
     (1) 
W1 f (x )
φ1 (x (2) , w ) φ2 (x (2) , w ) . . . φñ (x (2) , w ) W2  f (x (2) )
   =  
 ...  . . .  . . . 
φ1 (x (h) , w ) φ2 (x (h) , w ) . . . φñ (x (h) , w ) Wñ f (x (ñ) )
(7)
I ..and W = φ−1 f (assuming the inverse exists!)
I An important condition must be placed on the elementary functions i.e.
the inverse of φ exists
Advanced Machine Learning 8 / 20
Geometric Interpretation

Advanced Machine Learning 9 / 20


Geometric Interpretation

Equation (5) describes a projection of f (x) in to a set of basis


functions {φi (x, w )}. The basis functions dene a manifold and
fˆ(φ(x, w ), W ) is the image or projection of f (x) in this manifold

Advanced Machine Learning 9 / 20


Geometric Interpretation

Equation (5) describes a projection of f (x) in to a set of basis


functions {φi (x, w )}. The basis functions dene a manifold and
fˆ(φ(x, w ), W ) is the image or projection of f (x) in this manifold

Advanced Machine Learning 9 / 20


Choices for Elementary Functions

Advanced Machine Learning 10 / 20


Choices for Elementary Functions

If the elementary functions are not chosen properly, then there will
always be an error no matter how large ñ is

Advanced Machine Learning 10 / 20


Choices for Elementary Functions

If the elementary functions are not chosen properly, then there will
always be an error no matter how large ñ is
One requirement that we have seen is that φ−1 (·) must exist. This
condition is met if the elementary functions constitute a basis i.e. are
linearly independent

Advanced Machine Learning 10 / 20


Choices for Elementary Functions

If the elementary functions are not chosen properly, then there will
always be an error no matter how large ñ is
One requirement that we have seen is that φ−1 (·) must exist. This
condition is met if the elementary functions constitute a basis i.e. are
linearly independent
Fourier series, Wavelets are two widely used bases

Advanced Machine Learning 10 / 20


Choices for Elementary Functions

If the elementary functions are not chosen properly, then there will
always be an error no matter how large ñ is
One requirement that we have seen is that φ−1 (·) must exist. This
condition is met if the elementary functions constitute a basis i.e. are
linearly independent
Fourier series, Wavelets are two widely used bases
In neural networks, the bases are dependent on the data (as opposed
to being xed), and (ii) the co-ecients (weights) are adapted as
opposed to analytically computed

Advanced Machine Learning 10 / 20


Advanced Machine Learning 11 / 20
When f (x) is non-linear, there is no natural choice of the basis.
Volterra expansions, Splines etc. are some basis that have been tried.
As we saw before, Weierstrass showed that polynomials are universal
approximators

Advanced Machine Learning 11 / 20


When f (x) is non-linear, there is no natural choice of the basis.
Volterra expansions, Splines etc. are some basis that have been tried.
As we saw before, Weierstrass showed that polynomials are universal
approximators
The diculty is that either too many terms are required or the
approximation is not well behaved

Advanced Machine Learning 11 / 20


Multi-Layered Perceptrons (MLP)

Advanced Machine Learning 12 / 20


Multi-Layered Perceptrons (MLP)
Multi-layered Perceptrons (MLP) often use a bases of sigmoidal
functions

Advanced Machine Learning 12 / 20


Multi-Layered Perceptrons (MLP)
Multi-layered Perceptrons (MLP) often use a bases of sigmoidal
functions
The hidden layer changes the bases and consequently the manifold
and the output layer nds the best projection within the manifold


Advanced Machine Learning 12 / 20


Advanced Machine Learning 13 / 20
Recall,
n
!
 
(l) (l) (l)
(8)
X
hj = σ Sj =σ wjk xk
k=0
 
  ñ
(l) (l) (l)
(9)
X
ŷi = σ Si = σ Wij hj 
j=0

where, σ(a) = 1/(1 + e −a ) is the sigmoid function

Advanced Machine Learning 13 / 20


Recall,
n
!
 
(l) (l) (l)
(8)
X
hj = σ Sj =σ wjk xk
k=0
 
  ñ
(l) (l) (l)
(9)
X
ŷi = σ Si = σ Wij hj 
j=0

where, σ(a) = 1/(1 + e −a ) is the sigmoid function


Training is done by optimizing,
N
X
J = J (l)
l=1
N Xm  2
(l) (l)
(10)
X
= yi − ŷi
l=1 i=1

Advanced Machine Learning 13 / 20


Approximation Capabilities of MLP's

Advanced Machine Learning 14 / 20


Approximation Capabilities of MLP's
Let f (x) be the function to be approximated. Further, let us assume
that l uniformly sampled points (x (1) , x (2) , . . . , x (l) ) in (a, b) are
known. Let y (i) = f (x (i) )

Advanced Machine Learning 14 / 20


Approximation Capabilities of MLP's
Let f (x) be the function to be approximated. Further, let us assume
that l uniformly sampled points (x (1) , x (2) , . . . , x (l) ) in (a, b) are
known. Let y (i) = f (x (i) )

x (l+1) − x (l) = ∆x = (b − a)/l . x (1) − ∆x/2 = a and x (l) + ∆x/2 = b


Advanced Machine Learning 14 / 20
Advanced Machine Learning 15 / 20
Dene a function,

0
x <0

1 1

ζ(x) = sgn(x) + = undened x = 0 (11)
2 2 
1 x >0

Advanced Machine Learning 15 / 20


Dene a function,
x <0 0

1 1

ζ(x) = sgn(x) + = undened x = 0 (11)
2 2 
1 x >0

Then,
l     
∆x ∆x
(12)
X
(i) (i) (i)
f (x) = y ζ x −x + −ζ x −x −
i=1
2 2

Advanced Machine Learning 15 / 20


Dene a function,
x <0 0

1 1

ζ(x) = sgn(x) + = undened x = 0 (11)
2 2 
1 x >0

Then,
l     
∆x ∆x
(12)
X
(i) (i) (i)
f (x) = y ζ x −x + −ζ x −x −
i=1
2 2

Now,
   
(i) ∆x (i) ∆x
ζ x −x + −ζ x −x −
2 2
1 1
   
∆x ∆x
= sgn x − x + (i)
− sgn x − x −
(i)
(13)
2 2 2 2

Advanced Machine Learning 15 / 20


Advanced Machine Learning 16 / 20
Each term of the summation above can be produced by,

Advanced Machine Learning 16 / 20


Each term of the summation above can be produced by,

We can replace sgn(·) with a steep sigmoid

Advanced Machine Learning 16 / 20


Each term of the summation above can be produced by,

We can replace sgn(·) with a steep sigmoid


Each vertical bar can then be produced by a pair of neurons and we
can approximate f (x) to any desired degree of accuracy

Advanced Machine Learning 16 / 20


Each term of the summation above can be produced by,

We can replace sgn(·) with a steep sigmoid


Each vertical bar can then be produced by a pair of neurons and we
can approximate f (x) to any desired degree of accuracy
I Many authors (e.g Cybenko, Hornik and others) have formally shown
that the function realized by a single hidden layered neural network
with a nite number of neurons in the hidden layer is dense in teh
space of continuous functions

Advanced Machine Learning 16 / 20


Advanced Machine Learning 17 / 20
Advanced Machine Learning 17 / 20
Sigmoidal Neuron With 1 Input

Advanced Machine Learning 18 / 20


Sigmoidal Neuron With 1 Input

1
w0 = 0.5, w1 = 0.1
w0 = -5, w1 = 0.8
w0 = -1, w1 = -0.1

0.8

0.6

0.4

0.2

Advanced Machine Learning 18 / 20


Superposition of 4 Sigmoidal Neurons Each With 1 Input

Advanced Machine Learning 19 / 20


Superposition of 4 Sigmoidal Neurons Each With 1 Input

0.5

-0.5

Neuron 1
Neuron 2
Neuron 3 Advanced Machine Learning 19 / 20
Classication

Advanced Machine Learning 20 / 20


Classication
Though it is possible to set up a dierent cost function (e.g. cross
entropy) or approach classication in a dierent way, thresholding the
approximated output also results in classication

Advanced Machine Learning 20 / 20


Classication
Though it is possible to set up a dierent cost function (e.g. cross
entropy) or approach classication in a dierent way, thresholding the
approximated output also results in classication
For example,

0.4
Superposition (Outut)
Threshold

0.2

-0.2

Advanced Machine Learning 20 / 20


Classication
Though it is possible to set up a dierent cost function (e.g. cross
entropy) or approach classication in a dierent way, thresholding the
approximated output also results in classication
For example,

0.4
Superposition (Outut)
Threshold

0.2

-0.2

Advanced Machine Learning 20 / 20


Classication
Though it is possible to set up a dierent cost function (e.g. cross
entropy) or approach classication in a dierent way, thresholding the
approximated output also results in classication
For example,

0.4
Superposition (Outut)
Threshold

0.2

-0.2

Advanced Machine Learning 20 / 20

You might also like