Radial Basis Network: An Implementation OF Adaptive Centers: Nivas Durairaj Final Project For ECE539
Radial Basis Network: An Implementation OF Adaptive Centers: Nivas Durairaj Final Project For ECE539
Radial Basis Network: An Implementation OF Adaptive Centers: Nivas Durairaj Final Project For ECE539
AN IMPLEMENTATION
OF
ADAPTIVE CENTERS
Nivas Durairaj
Final Project for ECE539
Table of Contents
(ctrl+click to follow contents)
TABLE OF CONTENTS...............................................................................................................................2
LIST OF FIGURES........................................................................................................................................3
INTRODUCTION..........................................................................................................................................4
BACKGROUND.............................................................................................................................................4
METHODOLOGY & DEVELOPMENT OF PROGRAM........................................................................5
Adaptation Formulas..............................................................................................................................6
TESTING & COMPARISON OF RESULTS............................................................................................10
SINUSOID FUNCTION TESTING....................................................................................................................12
PIECEWISE-LINEAR FUNCTION...................................................................................................................14
POLYNOMIAL FUNCTION............................................................................................................................16
CONCLUSION OF RESULTS....................................................................................................................18
APPENDIX...................................................................................................................................................19
MANUAL FOR RBN_ADAPTIVE.M..............................................................................................................20
MANUAL FOR RBN_FIXED_SELFGEN.M......................................................................................................25
DERIVATION OF PARTIAL DERIVATIVES (ADAPTIVE RBF NETWORK)........................................................28
Linear Weights Partial Derivative Term...............................................................................................28
Positions of Centers Partial Derivative Term (hidden layer)...............................................................28
Spreads of Centers Partial Derivative Term(hidden layer)..................................................................29
EXCEL SPREADSHEET DATA FOR SINUSOIDAL, POLYNOMIAL,...................................................................30
& PIECEWISE LINEAR FUNCTIONS.............................................................................................................30
REFERENCES.............................................................................................................................................32
List of Figures
(ctrl+click to follow contents)
Introduction
What neural network model has the same benefits as a feedforward neural
network? Of course, it is the Radial Basis Function Network. Similar to feedforward
networks such as backpropagation and multilayer perceptron, the radial basis function
network aids us in function approximation, classification, and modeling of dynamic
systems. They have actually been used to produce results in stock market prediction and
speech recognition.
I chose to implement my Intro to Artificial Neural Networks project on RBFs
(Radial Basis Functions) because they are still an active research area and there is a lot to
be learned from them. These functions were first introduced in the solution of
multivariate interpolation problems and now it is one of the main fields of research in
numerical analysis. Since I was well acquainted with simple feedforward networks, I
decided to implement an adaptive center RBF. In addition, I have some interest in
Economics. The thought of producing an algorithm that could help predict the stock
market was very appealing to me.
Background
In its most basic form, an RBF consists of three layers with entirely different
roles. The input layer is made up of nodes that connect the network to its environment.
The second layer is the hidden layer of neurons. At the input of each neuron, the distance
between the neuron center and the input vector is calculated. By applying the radial basis
function (Gaussian bell function) to this distance, the output of the neuron is formed.
The last layer is the output layer. It is linear and supplies the response of the
network to the activation pattern. The rationale of a nonlinear transformation followed
by a linear transformation can be justified in a paper by Cover. [1] A pattern-classification
problem is more likely to be linearly separable in high-dimensional space. Therefore, this
is the reason for making the dimension of the hidden space in an RBF network high. It is
also important to note that the higher the dimension of the hidden space, the more
accurate it will be in smoothing the input-output mapping.
Radial basis functions have different learning strategies in the way they approach
a problem. Their linear weights tend to evolve on a different time scale compared to the
nonlinear activation function. Thus, to optimize the layers, it is best to operate on
different time scales. The different learning strategies depend mostly on changing how
the centers of the radial-basis functions of the network are specified. My project is based
on the particular learning strategy known as supervised selection of centers. Such a RBF
network is founded on the interpolation theory.
The easiest approach is to assume fixed radial-basis functions when defining the
activation functions of the hidden units. However, with additional computations, one can
create an RBF network whose centers of functions undergo a supervised learning process.
1 N 2
ej
2 j 1
Cost function
e j d j F * ( x j )
M
d j wi G ( x j t i
i 1
Ci
N is the size of the training sample, ej is the error signal and || . ||2 is the Euclidean
Distance or norm.
Ej consists of Greens function. The basic idea of a Greens function is to play an
important role in the solution of linear ordinary and partial differential equations. They
are also a key component in the development of integral equation methods.
G( x j t i
Ci
) exp(1 * ( x j t i ) t * C it * C i * ( x j t i ))
We can substitute
Greens function
G( x j ti
Ci
) exp(0.5 * ( x j t i ) t * i1 *( x j t i ))
where m
is the feature dimension of t and x. Thus, the Greens function results in a single number.
Ex. 1xm vector*mxm matrix*mx1 vector gives 1x1 number.
As seen from above, we need to find the parameters, wi, ti, and i such that it
minimizes the cost function. The adaptation formulas for the linear weights, positions,
and spreads of centers of RBF networks are given below. I was able to get this
information from Haykin on page 303. The derivations for the partial derivatives are
given in the appendix. [1]
1
Adaptation Formulas
1. Linear weights (output layer)
N
E (n)
e j ( n)G ( x j t i
wi (n) j 1
Ci
wi (n 1) wi (n) 1
E (n)
where i = 1, 2..c
wi (n)
Ci
) i1 [ x j t i (n)]
t i (n 1) t i (n) 2
E (n)
t i ( n)
Ci
)[ x j t i (n)][ x j t i (n)]t
i1 (n 1) i1 (n) 3
E (n)
i1 (n)
where i = 1, 2..c
The positions of centers were also computed in a similar way. However, ti was
going to be a vector that spans Rm where m is the feature dimension.
7
Spreads of centers were output in matrix form which was expected as the
updating inverse covariance was a matrix with mxm dimensions.
%Calculation of Spreads of centers (hidden layer)
spreaddiff=0;
for j=1:n
g=exp(-0.5*((x(j,:)-t(i,:)))*covinv(:,:,i)*((x(j,:)-t(i,:))'));
spreaddiff=spreaddiff + (e(j)*g*(x(j,:)-t(i,:))'*(x(j,:)-t(i,:)));
end
covinv(:,:,i)=covinv(:,:,i) - (eta3*-1*w(i)*spreaddiff); %mxm matrix
In regards to the power of Matlab, I probably should have coded the above using
matrix and vector operations. A for loop in Matlab takes up a lot of overhead. However,
since I am more used to C, I implemented it as I would in C to avoid confusion in my
calculations. Therefore, I believe this program can be further optimized to make full use
of the Matlab.
According to Haykin, there are a few points that need to be understood when
dealing with an adaptive center RBF network.
The cost function will be convex with respect to wi, but it is nonconvex
determining t , and
with respect to ti, and
1
i
To prevent infinite values, it is sometimes better to begin the search from a structured
initial condition that limits the parameter space to a known area. Before running the RBF
network, it may be useful to run it through a standard pattern classifier. This reduces the
chance of converging on a local minima.
The algorithm begins with the parameters w, t, and i which are given
below. It was very important that I set the variables at values that would allow the
network to run with the minimum errors. At the beginning, I had first initialized w to w
=0.005*randn(c, 1). Unfortunately, this was not a good method of initializing w, because
my RBF network produced results that were flagrantly incorrect. I tried many times to
find proper eta parameters but that was not possible. Since I was trying to produce a RBF
network that would be comparable a fixed-center RBF, I decided to set my initial weights
to w=pinv(G)*d. This improved my results immensely because my weights were limited
1
to a known area. The vector t was initialized using the kmeans algorithm. i was
initialized to an identity matrix of size m by m by c where m is the number of features
and c is the number of cluster centers. I thought that this was a good starting point since
it reduced any chances of getting stuck in a local minimum at initialization itself.
1
Training Set
0.8
0.6
0.4
0.2
-0.2
-0.4
-0.3
-0.2
-0.1
0.1
0.2
0.3
10
0.4
0.5
1.2
test samples
approximated curve
train samples
radial basis
0.8
0.6
0.4
0.2
-0.2
-0.4
-0.3
-0.2
-0.1
0.1
0.2
0.3
0.4
0.5
11
1.2
test samples
approximated curve
train samples
radial basis
0.8
0.6
0.4
0.2
-0.2
-0.4
-0.3
-0.2
-0.1
0.1
0.2
0.3
0.4
0.5
To see if I could reduce the cost of the adaptive center RBF network, I tried
modifying the eta parameters from 0.5. My conclusion was that modifying the eta
parameters can reduce the costs but they may not be significantly lower than costs of a
fixed center RBF network.
Eta1
0.3
0.2
0.8
Eta2
0.3
0.5
0.2
Eta3
0.3
0.9
0.3
Cost
0.403
0.403
0.404
Using Dr. Hus function generator, I was able to generate a few functions to test
on my RBF networks. I wanted to see if a certain type of RBF network would actually
perform better in certain situations. The function generation output training and testing
data for 3 functions, namely sinusoid, piecewise-linear and polynomial. I decided to use
12
the sinusoid, piecewise-linear, polynomial functions to compare the results of the two
RBF networks.
1
test samples
approximated curve
train samples
radial basis
0.5
-0.5
-1
-1.5
-0.8
-0.6
-0.4
-0.2
0.2
0.4
0.6
Figure 6:RBF network output (Sinusoid Function) with 7 Radial Basis Functions
13
Testing the radial basis function networks against the sinusoid data, the data
seemed to show that for fewer radial basis functions, the adaptive center RBF network
performs slightly better. However, after that, a fixed-center RBF network achieves
results that are similar if not better than the other RBF network. As a side note, we can
probably forget about the cost output of two radial basis functions since two is too few a
number to correctly match the sinusoid function. The data for the above is chart is given
in the appendix.
14
Piecewise-Linear Function
RBF with Adaptive Centers
1.5
0.5
-0.5
-1
-1.5
test samples
approximated curve
train samples
radial basis
-2
-2.5
-0.5
-0.4
-0.3
-0.2
-0.1
0.1
0.2
0.3
0.4
0.5
0.5
-0.5
-1
test samples
approximated curve
train samples
radial basis
-1.5
-0.5
-0.4
-0.3
-0.2
-0.1
0.1
0.2
0.3
0.4
15
0.5
For this function, the adaptive center RBF network performed better till the
number of radial basis functions reached 6. After 6, the fixed-center RBF network began
to gain better results. I stopped compiling the cost outputs at 10 radial basis functions as
the differences were in the powers of negative 7. Nevertheless, at 9 radial basis functions,
both the adaptive center and fixed center network models were providing similar
approximations of the piecewise-linear function. At 10 radial basis functions, the
adaptive center RBF network provided the best model with a cost function output of
3.7823x10-7. Data for the chart is given in the appendix.
16
Polynomial Function
RBF with Adaptive Centers
0.1
0.08
0.06
0.04
0.02
0
test samples
approximated curve
train samples
radial basis
-0.02
-0.04
-0.06
-0.08
-0.5
-0.4
-0.3
-0.2
-0.1
0.1
0.2
0.3
0.4
0.5
Figure 11:Adaptive center RBF network for Polynomical Function (6 Radial Basis Functions)
17
The adaptive center RBF network was clearly the winner in the approximation of
the polynomial function. I ran it a number of times but I stopped at 6 radial basis
functions as the cost function gave me an output of 4.1883x10-12. The results of the cost
function were too minute for Excel to plot them on the chart. However, you can find the
relevant data in the appendix.
18
Conclusion of Results
Depending on the application, RBF networks can gain a lot by adapting the
positions of the centers of the radial-basis function. For example in speech recognition, it
was found that when a minimal network was required, it was beneficial to use a RBF
with nonlinear optimization of parameters defining the activation functions of the hidden
layer. However, it was also true that a bigger RBF network with more fixed centers could
attain a similar kind of performance.
From my results, I can say that a RBF network with adaptive centers can perform
a little better than a fixed-center RBF network. If fewer radial basis functions are
required, then it is probably true that the RBF network with adaptive centers would work
best in such a situation. However, an RBF with fixed centers may prove to be more
useful in certain cases. With respect to my adaptive-center RBF network program, the
RBF network with fixed centers computed faster results. My program took a longer time
since it had to update each individual weight, cluster center vector, and inverse
covariance matrix. I also spent a lot of time modifying the eta values in the adaptive
center model to prevent infinite values. This was a major advantage, the fixed center
RBF network had. To optimize the adaptive RBF network program, I would probably
have to implement it using matrix and vector operations instead of loops. In conclusion, I
would like to say that both RBF network models are important and one cannot rightly say
that a particular model is better unless the situation is known.
I learnt a lot from programming the adaptive center RBF network. Although the
programming was not very difficult, I had to understand the equations of the supervised
selection of centers algorithm. This took some time since I sometimes received outputs
with incorrect dimensions. (Ex. matrices instead of vectors) The project gave me a
chance to appreciate the beauty of neural networks and I enjoyed completing it.
19
APPENDIX
20
21
%
% rbn_adaptive.m - RBF demonsration program of Supervised Selection of
%Centers
% Based on RBNdemo By Dr. Yu Hen Hu
% call fungenf.m, cinit.m, gauss.m, kmeansf.m
%
%
%
% Data points in matrix x (n by k)
% cluster centers in matrix t (v by m)
%
%
% n: number of samples
% v: size of t
% k, m: dimension of feature space
% c: number of radial basis functions used
% spread of center - spread matrix
% G - Green's matrix
% Specify:
% eta1, eta2, eta3
%
%
%
22
23
24
for j=1:n
g=exp(-0.5*((x(j,:)-t(i,:)))*covinv(:,:,i)*((x(j,:)t(i,:))'));
postdiff = postdiff + (e(j)*g*covinv(:,:,i)*(x(j,:)t(i,:))');
end
t(i,:)=t(i,:)-(eta2*2*w(i)*postdiff)';
[c,n]=size(mint);
% note that sigma is n by n by c
% fhat=w(1)*ones(size([x;y]));
fhat=gauss([x;y],mint,mincovinv)*minw;
fd=gauss(mint,mint,mincovinv)*minw;
figure(2),%subplot(122)
plot(y,yd,'ob',[x;y],fhat,'+b',x,d,'.r',mint,fd,'dr'),
legend('test samples','approximated curve','train samples','radial
basis',0)
title('RBF Network with Adaptive Centers');
end
25
26
%
%
%
%
%
%
%
%
%
%
clear all,
close all;
% generate 2D data trainf, testf
Nr=input('# of training samples = ');
Nt=input('# of testing samples = ');
% generate the training and testing data samples
funtype=input('1. Sinusoids, 2. piecewise linear, or 3. polynomial.
Enter choice: ');
switch funtype
case 1 % a sinusoidal signal is to be generated
tp=[.7 -.2]; % y = cos(4*pi*0.7*x + (-.2))
case 2 % piecewise linear function
tp=[-.5 0 -.1 .2 .1 .2 .3 1 .5 0];
case 3 % polynomial specified by roots
tp=[2 -.3 0 0.2];
end
xgen=0;
% only regularly spaced data samples are generated
xorder=2; % training and testing data are evenly interlaced
[trainf,testf]=fungenf(Nr,Nt,xgen,funtype,tp,xorder);
x=trainf(:,1); d=trainf(:,2);
xmean=mean(x); % xmean is 1 by n
y=testf(:,1); yd=testf(:,2);
[k,n]=size(x); % m # of samples, n: dim of feature space
x=trainf(:,1); d=trainf(:,2);
xmean=mean(x); % xmean is 1 by n
y=testf(:,1); yd=testf(:,2);
[k,n]=size(x); % m # of samples, n: dim of feature space
for type=2:2,
% determine radial basis centers and cluster numbers
if type==1,
xi=x; c=k;
elseif type==2;
% decide # of radial basis functions
%figure(1),subplot(122),plot(x,d,'o'),axis square,drawnow
c=input('number of radial basis functions used: ');
xi=cinit(x,2,c); % spread initial cluster center over entire range
xi=kmeansf(x,xi,.0001,50);
27
end
% find weights w, and approximated curve fhat
if type==1,
lambda=input('smoothing parameter, lambda (>=0) = ');
elseif type==2,
lambda=0;
[w,xi,sigma, G, G0]=rbn(x,d,xi,lambda,2);
% the rbn.m routine may change the # of clusters!
[c,n]=size(xi);
% note that sigma is n by n by c
% fhat=w(1)*ones(size([x;y]));
fhat=gauss([x;y],xi,sigma)*w;
fd=gauss(xi,xi,sigma)*w;
figure(1),%subplot(122)
plot(y,yd,'ob',[x;y],fhat,'+b',x,d,'.r',xi,fd,'dr'),
legend('test samples','approximated curve','train samples','radial
basis',0)
title('RBN with fixed centers')
%Cost function added to evaluate the RBF Network with Fixed Centers
costd=[d;yd];
e=costd-fhat;
cost=0;
for j=1:n
cost=cost+e(j)^2;
end
%Actual cost function
cost=0.5*cost
end
end
28
1 N 2
ej
2 j 1
e j d j F * ( x j )
where
d j wi G ( x j t i
i 1
Ci
e j (n)G ( x j t i
j 1
Ci
e j
t i
G ( x j t i
Ci
t i
Ci
= wi G (
)
x j ti
Ci
( x j ti
t i
Ci
where t ( x j t i
i
Ci
( x j t i ( n)) t i1 ( x j t i ( n))
Therefore,
N
E (n)
2 * wi (n) * e j (n)G ( x j t i
t i (n)
j 1
Ci
) i1 [ x j t i (n)]
29
e j
E (n) 1 N
2e j 1
1
i 2 j 1 i
e j
wi G( x j t i
1
where
( x j ti
i1
Ci
)
C
i
( x j ti
1
i
Ci
) [ x j t i (n)][ x j t i (n)]t
Therefore,
N
E (n)
wi (n) * e j (n)G ( x j t i
i1 (n)
j 1
Ci
)[ x j t i (n)][ x j t i (n)]t
30
Polynomial Function
# of Training Samples - 20
# of Testing Samples - 50
Eta parameters were changed a few times to prevent convergence at local minima.
Usually, eta1=eta2=eta3=0.000001
Cost Function Outputs
31
Piecewise-Linear Function
# of Training Samples - 10
# of Testing Samples - 40
Eta parameters were changed a few times to prevent convergence at local minima.
Usually, eta1=eta2=eta3=0.000001
Cost Function Outputs
32
References
[1] Haykin, S., Neural Networks a Comprehensive Foundation, New Jersey,
Prentice Hall, 1994.
[2] Hu, Yu Hen Introduction to Neural Networks and Fuzzy Systems Retrieved
October 15, 2003. from http://www.cae.wisc.edu/~ece539
[3] Mehrotra K., Mohan C., et Ranka S., Elements of Artificial Neural Networks,
Cambridge, The MIT Press, 1997.
[4] Orr, Mark., Radial Basis Function Networks, www.anc.ed.ac.uk/~mjo,
Edinburgh University, Edinburgh, Scotland February 2000.
[5] Mathworks. Radial Basis Functions. Retrieved November 25, 2003, from
www.mathworks.com
[6] University of Tubingen, Radial Basis Functions (RBFs). Retrieved November
30, 2003, from
http://www-ra.informatik.uni-tuebingen.de/SNNS/UserManual/node182.html
33