Radial Basis Function Networks: The Structure of The RBF Networks
Radial Basis Function Networks: The Structure of The RBF Networks
Radial Basis Function Networks: The Structure of The RBF Networks
Radial basis function (RBF) networks are feed-forward networks trained using a supervised training
algorithm. They are typically configured with a single hidden layer of units whose activation function is
selected from a class of functions called basis functions. The major difference between RBF networks
and back propagation networks (that is, multi-layer perceptron trained by Back Propagation algorithm)
is the behavior of the single hidden layer.
The Structure of the RBF Networks
RBF is first introduced in the solution of the real multivariable interpolation problems. The structure of an
RBF networks in its most basic form involves three entirely different layers:
20
Using the Euclidean distance.
The output hi of each hidden unit i is then computed by applying the basis function G to this distance
The basis function is a curve (typically a Gaussian function, the width corresponding to the variance,σi )
which has a peak at zero distance and it decreases as the distance from the center increases.
Base function
ange of theoretical and empirical studies have indicated that many properties of the interpolating
function are relatively insensitive to the precise form of the basis functions .Some of the most
commonly used basis functions are: , for the Gaussian Functions:
( )
Output layer
th
The transformation to the hidden unit space to the output space is linear. The j output is computed as
Mathematical model
In summary, the mathematical model of the RBF network can be expressed as:
∑ | |
Function approximation
Let y=g(u) be a given function of u, y∈R, u∈R, g:R→R, and let Gi i=1..L, be a finite set of basis
functions. The function g can be written in terms of the given basis functions as
∑
21
∫
Approximation by RBFNN
Now, consider the single input single output RBF network shown in Figure 4. Then y can be
written as:
∑ | |
Where f(u) is the output of the RBFNN given in Figure and r(u) is the residual. By setting the center c i, the
variance σi , and the weight wi the error appropriately, the error can be minimized.
Data Interpolation
k k
Given input output training patterns (u ,y ), k=1,2, ..K, the aim of data interpolation is to
approximate the function y from which the data is generated. Since the function y is unknown, the
problem can be stated as a minimization problem which takes only the sample points into
consideration:
Choose wi,j and ci, i=1,2...L, j=1,2...M so as to minimize:
As an example, the output of an RBF network trained to fit the data points given in Table is given
in Figure 95.
TABLE I: 13 data points generated by using sum of three gaussians with c1=0.2000
c2=0.6000 c3=0.9000 w1=0.2000 w2=0.5000 w3=0.3000 σ=0.1000
22
data no 1 2 3 4 5 6 7 9 10 11 12 13
x 0.0500 0.2000 0.2500 0.3000 0.4000 0.4300 0.4800 0.6000 0.7000 0.8000 0.9000 0.9500
f(x) 0.0863 0.2662 0.2362 0.1687 0.1260 0.1756 0.3290 0.6694 0.4573 0.3320 0.4063 0.3535
Figure 5 Output of the RBF network trained to fit the datapoints given in Table
Note that the training problem becomes quadratic once if ci‟s (radial basis function centers) are known.
23
Self-Organizing Maps
Topology Preserving Maps
A topology is a system of subsets called neighborhoods.
In a metric space (X, d), the neighborhood Oε(x) of point x can be defined as a set of points a
such that
d(x, a) < ε
If X and Y are two topological spaces, then functions f : X → Y that preserve the topology of
X in Y are continuous functions.
Continuous functions map some neighborhood O(x) ⊆ X into any neighborhood O(y) ⊆ Y :
f(x) = y and f(O(x)) ⊆ O(y)
Thus, to visualise data we need a continuous function from m-dimensional space into a 2-
dimensional space:
f : Rm → R2
Remark 1. If the data „lives‟ in a m-dimensional space with m > 3, then there A topology is a
system of subsets O X, called neighborhoods.
Self-Organising Maps
1 SOM Architecture
( ) √
24
• The nodes are arranged into an k × l grid (output space).
• The topology in the grid is defined by another metric, such as the taxi-cab
distance:
Competition
• An input vector x = (x1, . . . , xm) is compared with the weight vector wj = (w1j, . . . , wmj) of
each node by computing the distance d(x, wj):
√
.
.
.
√
• The winner is the node with the weight wj closest to the input x (i.e. shortest d(x, wj)).
• Thus, nodes „compete‟ in the sense which of the nodes wj is more „similar‟ to a given input
pattern x.
Example 1. Consider SOM with three inputs and two output nodes (A and B).
Let
wA = (2, −1, 3) , wB = (−2, 0, 1)
25
• Node A is the winner because it is „closer‟ ( √ √ )
What if x = (−1, −2, 0)>
Adaptation
After the input x has been presented to SOM, the weights of all nodes are adapted, so that they
become more „similar‟ to the input x vector.
• The adaptation formula for node j is:
[ ]
where
– wj is the weight vector of node j ∈ [1, . . . , k × l];
– α is the learning rate coefficient;
– hij is the neighborhood of node j with respect to the winner i.
Adaptation (cont.)
To understand better the adaptation formula, let us check how the weights change for
different values of α and hij.
[ ]
- Suppose α =0 or hij= 0
[ ]
- Suppose α =1 or hij= 1
[ ]
The new weight is equal to the input (
Cooperation
• Although weights of all nodes are adapted, they do not adapt equally. Adaptation
depends on how close the nodes are from the winner in the output lattice.
• If the winner is node i, then the level of adaptation for node j is defined by the
neighborhood function hij = h(d(i, j)), where d(i, j) is the distance in the lattice.
• The neighborhood is defines in such a way that it is smaller as the distance d(i, j) gets
larger. For example, the Gaussian bell function
26
• The winner „helps‟ mostly its neighbours to adapt. Note also that the winner is
adapted more than any other node (i.e. because d(i, i) = 0).
Example 2. Let α = 0.5 and h = 1, and let us adapt the winning node A from previous
example:
wA = (2, −1, 3) , x = (1, -2 ,2)
the adaptation formula:
[ ]
27