Time-Series Prediction Using A Local Linear Wavelet Neural Network
Time-Series Prediction Using A Local Linear Wavelet Neural Network
Time-Series Prediction Using A Local Linear Wavelet Neural Network
Abstract
A local linear wavelet neural network (LLWNN) is presented in this paper. The difference
of the network with conventional wavelet neural network (WNN) is that the connection
weights between the hidden layer and output layer of conventional WNN are replaced by a
local linear model. A hybrid training algorithm of particle swarm optimization (PSO) with
diversity learning and gradient descent method is introduced for training the LLWNN.
Simulation results for the prediction of time-series show the feasibility and effectiveness of the
proposed method.
r 2005 Elsevier B.V. All rights reserved.
Keywords: Local linear wavelet neural network; Particle swarm optimization algorithm; Gradient descent
algorithm; Time-series prediction
1. Introduction
Corresponding author.
E-mail addresses: yhchen@ujn.edu.cn (Y. Chen), yangbo@ujn.edu.cn (B. Yang), csmaster@ujn.edu.cn
(J. Dong).
0925-2312/$ - see front matter r 2005 Elsevier B.V. All rights reserved.
doi:10.1016/j.neucom.2005.02.006
ARTICLE IN PRESS
450 Y. Chen et al. / Neurocomputing 69 (2006) 449–465
wavelets), which are localized in both the time space and frequency space,
has been developed as an alternative approach to nonlinear fitting problem
[31,36]. Two key problems in designing of WNN are how to determine WNN
architecture and what learning algorithm can be effectively used for training
the WNN [2]. These problems are related to determine an optimal WNN
architecture, to arrange the windows of wavelets, and to find the proper orthogonal
or nonorthogonal wavelet basis. Curse of dimensionality is a mainly unsolved
problem in WNN theory which brings some difficulties in applying a WNN to high-
dimension problems.
The basis function neural networks are a class of neural networks, in which
the output of the network is a weighted sum of a number of basis functions.
The usually used basis functions include Gaussian radial basis functions,
B-spline basis functions, wavelet basis functions and some neurofuzzy basis
functions [3,12].
The particle swarm optimization (PSO) is a population based optimization
method first proposed by Kennedy and Eberhart [14]. Some of the attractive features
of the PSO include ease of implementation and the fact that no gradient information
is required. It can be used to solve a wide array of different optimization problems.
Some example applications include neural network training [7,28,29,6] and function
minimization [23,24].
Time-series forecasting is an important research and application area.
Much effort has been devoted over the past several decades to the development
and improvement of time series forecasting models. Well established
time series models include: (1) linear models, e.g., moving average,
exponential smoothing and the autoregressive integrated moving average
(ARIMA); (2) nonlinear models, e.g., neural network models and fuzzy
system models [15,19]; and (3) the combination of linear and nonlinear
models [35].
In this paper, a local linear wavelet neural network (LLWNN) is proposed,
in which the connection weights between the hidden layer units and output
units are replaced by a local linear model. The usually used learning algorithm
for WNN is gradient descent method. But its disadvantages are slow
convergence speed and easy stay at local minimum. A combination approach
of PSO with adaptive diversity learning and gradient descent method is
proposed for training the LLWNN. Simulation results for time-series
prediction problems show the effectiveness of the proposed method. The
main contributions of this paper are (1) the LLWNN providing a more
parsimonious interpolation in high-dimension spaces when modelling
samples are sparse; (2) a novel hybrid training algorithm for WNN and LLWNN
was proposed.
The paper is organized as follows. The LLWNN is introduced in Section 2. A
hybrid learning algorithm for training LLWNN is described in Section 3. The
experiments on time-series prediction problems are given in Section 4. A short
discussion is given in Section 5. Finally, concluding remarks are derived in the last
section.
ARTICLE IN PRESS
Y. Chen et al. / Neurocomputing 69 (2006) 449–465 451
x ¼ ðx1 ; x2 ; . . . ; xn Þ,
where ci is the wavelet activation function of ith unit of the hidden layer and oi is
the weight connecting the ith unit of the hidden layer to the output layer unit. Note
that for the n-dimensional input space, the multivariate wavelet basis function can be
calculated by the tensor product of n single wavelet basis functions as follows
Y
n
cðxÞ ¼ cðxi Þ. (3)
i¼1
Obviously, the localization of the ith units of the hidden layer is determined by the
scale parameter ai and the translation parameter bi : According to the previous
researches, the two parameters can either be predetermined based upon the wavelet
transformation theory or be determined by a training algorithm. Note that the above
WNN is a kind of basis function neural network in the sense of that the wavelets
consists of the basis functions.
Note that an intrinsic feature of the basis function networks is the localized
activation of the hidden layer units, so that the connection weights associated with
the units can be viewed as locally accurate piecewise constant models whose validity
for a given input is indicated by the activation functions. Compared to the multilayer
perceptron neural network, this local capacity provides some advantages such as the
learning efficiency and the structure transparency. However, the problem of basis
function networks is also led by it. Due to the crudeness of the local approximation,
a large number of basis function units have to be employed to approximate a given
system. A shortcoming of the WNN is that for higher dimensional problems many
hidden layer units are needed.
ARTICLE IN PRESS
452 Y. Chen et al. / Neurocomputing 69 (2006) 449–465
In order to take advantage of the local capacity of the wavelet basis functions
while not having too many hidden units, here we propose an alternative type of
WNN. The architecture of the proposed LLWNN is shown in Fig. 1. Its output in
the output layer is given by
X
M
y¼ ðoi0 þ oi1 x1 þ þ oin xn ÞCi ðxÞ
i¼1
X
M
x bi
¼ ðoi0 þ oi1 x1 þ þ oin xn Þjai j1=2 c , ð4Þ
i¼1
ai
where x ¼ ½x1 ; x2 ; . . . ; xn : Instead of the straightforward weight oi (piecewise
constant model), a linear model
vi ¼ oi0 þ oi1 x1 þ þ oin xn (5)
is introduced. The activities of the linear models vi ði ¼ 1; 2; . . . ; MÞ are determined
by the associated locally active wavelet functions ci ðxÞ ði ¼ 1; 2; . . . ; MÞ; thus vi is
only locally significant. The motivations for introducing the local linear models into
a WNN are as follows: (1) local linear models have been studied in some neuro-fuzzy
systems and shown good performances [9,8]; and (2) local linear models should
provide a more parsimonious interpolation in high-dimension spaces when
modelling samples are sparse.
The scale and translation parameters and local linear model parameters are
randomly initialized at the beginning and are optimized by a hybrid learning
algorithm discussed in the following section.
Σ
ωM0+ωM1x1+...+ωMnxn
ω10+ω11x1+...+ω1nxn
ω20+ω21x1+...+ω2nxn
ψ1 ψ2 ... ψM
...
x1 x2 xn
The main motivation for improving the particles with diversity operator in search
space is that the particles in the basic PSO tend to cluster too closely. When an
optimum (local or global) is found by one particle the other particles will be drawn
towards it. If all particles end up in this optimum, they will stay at this optimum
without much chance to escape. This simply happens because of the way the basic
PSO works. If the identified optimum is only local it would be advantageous to let
some of the particles explore other areas of the search space to fine turning the
solution.
ARTICLE IN PRESS
454 Y. Chen et al. / Neurocomputing 69 (2006) 449–465
where adjustable parameters q 2 ½0; 1 and b are used to control the range and
direction of the diversification search. Two example graphs of the PDF are shown in
Fig. 2 (left and right). It can be seen that the smaller the b; the larger the local search
range; the larger the q; the higher the search probability in positive direction. q ¼ 0:5
means that there is same search probability in positive and in negative direction.
By solving the above PDF inversely, a random diversity search vector d ¼
½d 1 ; d 2 ; . . . ; d N ; which belongs to the proposed PDF in Eq. (8), can be obtained as
follows:
8
> 1 ri
>
> ln if 0ori p1 qi ;
<b 1 qi
di ¼ (9)
> 1 1 ri
>
> ln if 1 q pr o1 ;
: b qi i i
where ri is a random real number uniformly distributed at [0,1], qi ¼ 0:5 for our
experiments, i ¼ 1; 2; . . . ; N and N is the number of particles whose turning strategy
is following the diversity rule.
In each generation, after the fitness values of all the particles are calculated, the
top 70% best-forming ones are marked and form the first group. Another 30%
particles will enhance themselves by using diversity rule as follows:
zðt þ 1Þ ¼ zðtÞ þ dðt þ 1Þ, (10)
0.03 0.05
β =0.01
q =0.01
0.025 β =0.02
q =0.02
β =0.05 0.04
0.02 q =0.05
0.015 0.03
p(x)
p(x)
0.01 0.02
0.005
0.01
0
-0.005 0
-0.01 -0.01
-1000 -800 -600 -400 -200 0 200 400 600 800 -1000 -800 -600 -400 -200 0 200 400 600 800
(a) x (b) x
Fig. 2. A specific probability distribution function in which the shape of the function depends on the
parameters b and q: (a) The larger the b; the smaller the local search range. (b) The larger the q; the higher
the search probability in positive direction.
ARTICLE IN PRESS
Y. Chen et al. / Neurocomputing 69 (2006) 449–465 455
where zðt þ 1Þ represents the model free parameters at generation t þ 1 and each
element of the vector dðt þ 1Þ takes the same form as shown in Eq. (9). The particles
in first group will enhance themselves based on their own private cognition and
global social interactions with each other (this means that the global best position,
pg ; is taken from the total population), and evolve by using Eqs. (6) and (7). It should
be noted that the control parameter b in Eq. (9) is adjusted according to the search
process with iteration steps as follows:
ðbðtÞ bð0ÞÞðMAXITER-iterÞ
bðt þ 1Þ ¼ þ bð0Þ. (11)
MAXITER
The process is shown in Fig. 3, in which with the decrease of b in the initial stage of
the search, the local search range becomes larger quickly and then it stayed with a
fixed value.
3.3. Combination of ADLPSO and gradient descent algorithms for training LLWNN
0.5
0.45
0.4
0.35
0.3
β (t)
0.25
0.2
0.15 β (t)
0.1
0.05
0
0 200 400 600 800 1000 1200 1400 1600 1800 2000
iter
Fig. 3. The variation process of the control parameter b with maximum iteration number 2000 and
bð0Þ ¼ 0:05; bð1Þ ¼ 0:4:
ARTICLE IN PRESS
456 Y. Chen et al. / Neurocomputing 69 (2006) 449–465
then a gradient descent search algorithm is employed to fine turn the optimal
solution.
Before describing details of the algorithm for training LLWNN, the issue of
coding is presented. Coding concerns the way the weights, dilation and translation
parameters of LLWNN are represented by individuals or particles. A float point
coding scheme is adopted here. For LLWNN coding, suppose there are M nodes in
hidden layer and n input variables, then the total number of parameters to be coded
is ð2n þ n þ 1Þ M ¼ ð3n þ 1ÞM: The coding of a LLWNN into an individual or
particle is as follows:
ja11 b11 . . . a1n b1n o10 o11 . . . o1n ja21 b21 . . . a2n b2n o20 o21 . . . o2n j . . .
jan1 bn1 . . . ann bnn on0 on1 . . . onn j
The simple loop of the proposed training algorithm for LLWNN is as follows.
S1: Initialization. Initial population is generated randomly. The learning
parameters, such as c1 ; c2 ; bð0Þ and bð1Þ in ADLPSO, and learning rate and
momentum in BP should be assigned in advance.
S2: Parameter optimization with ADLPSO algorithm.
S3: If maximum number of generations is reached or no better parameter vector is
found for a significantly long time (100 steps), then go to step S4; otherwise goto step
S2.
S4: Parameter optimization with gradient descent algorithm.
S5: If the satisfactory solution is found then stop; otherwise goto step S4.
4. Experiments
The developed LLWNN model is applied here in conjunction with two time-series
prediction problems: the Box–Jenkins time series and the Mackey–Glass time series.
Well-known benchmark examples are used for the sake of an easy comparison with
existing models.
In this work, the mother wavelet (Eq. (12)) is used for both experiments.
x2
cðxÞ ¼ x exp . (12)
2
The objective function used is the root mean square error (RMSE),
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
u N
u1 X
RMSE ¼ t ðyi yi2 Þ2 , (13)
N i¼1 1
where yi1 and yi2 denote the target output and model output, respectively.
The parameters used for both experiments in optimization of LLWNN and WNN
are listed in Table 1. In addition, all experiments were performed using an 2.0 GHz
processor with 512 MB of RAM.
ARTICLE IN PRESS
Y. Chen et al. / Neurocomputing 69 (2006) 449–465 457
Table 1
Parameter settings
Population size 50
c1 and c2 2.0
a 0.7
w 0.8
bð0Þ 0.05
bð1Þ 0.4
Learning rate 0.05
Momentum term 0.75
In this section, the proposed LLWNN model is applied to the gas furnace data
(series J) prediction problem [1]. The data set was recorded from a combustion
process of a methane–air mixture. It is well known and frequently used as a
benchmark example for testing identification and prediction algorithms. The data set
consists of 296 pairs of input–output measurements. The input uðtÞ is the gas flow
into the furnace and the output yðtÞ is the CO2 concentration in outlet gas. The
sampling interval is 9 s.
Following previous researchers [16] in order to make a meaningful comparison,
the inputs of the prediction model are selected as uðt 4Þ and yðt 1Þ; and the
output is yðtÞ:
We simulated the following four cases: (1) the gradient descent algorithm was
employed to train a WNN model with network architecture {2-8-1}; (2) the proposed
hybrid learning algorithm was employed to train a WNN model with same network
architecture; (3) the gradient descent algorithm (with momentum) was employed to
train a LLWNN model with network architecture {2-8-1}; and (4) the proposed
hybrid learning algorithm was employed to train a LLWNN model with same
network architecture. For each of the cases, the data were partitioned in 200 data
points as a training set, and the remaining 92 points as a test set for testing the
performance of the evolved model. In order to remove the effects of the initial values
of free parameters on the final results, for each of the cases, 20 experiments (runs)
were performed with randomly set initial parameters. Each of the models was trained
for 3000 epochs. Over 20 runs, the average RMSE for cases 1, 2, 3, and 4 on test data
set are 0.048, 0.042, 0.034 and 0.025, respectively. The maximum and minimum
deviations in RMSE are 0.033 and 0.014. For the best results, a comparison of
LLWNN and WNN with gradient descent algorithm and the new hybrid technique
is shown in Table 2. Table 3 shows the comparison of test results of different models
for Box–Jenkins data prediction problem. A comparison has been made to show the
actual time-series, the output of the best LLWNN model and the prediction error
using the hybrid training algorithm for training and test data sets (see Fig. 4). Testing
the methods with different number of hidden units and comparing only the best
results got in cases of LLWNN and WNN are shown in Fig. 5.
ARTICLE IN PRESS
458 Y. Chen et al. / Neurocomputing 69 (2006) 449–465
Table 2
Comparison of LLWNN and WNN for Box–Jenkins time-series prediction
Model Structure Para. Training time (s) RMSE training RMSE testing
Table 3
Comparison of test results of different models for Box–Jenkins data prediction problem
It is clear that the LLWNN has more accuracy than the conventional WNN
though it holds a few more free parameters. Meanwhile, the simulation results
demonstrated that the new hybrid training algorithm is efficient than the
conventional gradient learning algorithm.
0.8
outputs and error
0.6
0.4
0.2
-0.2
0 50 100 150 200 250 300
time
Fig. 4. The time series data, output of the LLWNN and the prediction error for training and test samples
using the hybrid training algorithm.
0.025
0.024
0.023
RMSE for training
0.022
0.021
LLWNN WNN
0.02
0.019
0.018
0.017
0 2 4 6 8 10 12 14 16 18 20
Number of hidden nodes
Fig. 5. The performance of LLWNN and WNN in RMSE for training data set with different hidden
units.
ARTICLE IN PRESS
460 Y. Chen et al. / Neurocomputing 69 (2006) 449–465
where t417; the equation shows chaotic behavior. To make the comparison with
earlier work fair, we predict the xðt þ 6Þ using the input variables xðtÞ; xðt 6Þ;
xðt 12Þ and xðt 18Þ; respectively.
We simulated the following four cases: (1) the gradient descent algorithm was
employed to train a WNN model with network architecture {4-10-1}; (2) the
proposed hybrid learning algorithm was employed to train a WNN model with same
network architecture; (3) the gradient descent algorithm (with momentum) was
employed to train a LLWNN model with network architecture {4-10-1}; and (4) the
proposed hybrid learning algorithm was employed to train a LLWNN model with
same network architecture. For each of the cases, 1000 sample points are used in our
study. The first 500 data pairs of the series were used as training data, while the
remaining 500 were used to validate the model identified. In order to remove the
effects of the initial values of free parameters on the final results, for each of the
cases, 20 experiments (runs) were performed with randomly set initial parameters.
Each of the models was trained for 3000 epochs. Over 20 runs, the average RMSE
for cases 1, 2, 3, and 4 on test data set are 0.0089, 0.071, 0.049 and 0.042,
respectively. The maximum and minimum deviations in RMSE are 0.0028 and 0.012.
For the best results, a comparison of LLWNN and WNN with gradient descent
algorithm and the new hybrid technique is shown in Table 4. Table 5 shows the
comparison of test results of different models for the chaotic Mackey–Glass
prediction problem. A comparison has been made to show the actual time-series, the
output of the best LLWNN model and the prediction error using the hybrid training
algorithm for training and test data sets (see Fig. 6). Testing the methods with
different number of hidden units and comparing only the best results got in cases of
LLWNN and WNN are shown in Fig. 7.
It is also clear that the LLWNN has more accuracy than the conventional WNN
though it holds a few more free parameters. Meanwhile, the simulation results
demonstrated that the new hybrid training algorithm is efficient than the
conventional gradient learning algorithm.
From above simulation results, it can be seen that the proposed LLWNN model
with new hybrid technique works well for generating prediction models of time
series.
Table 4
Comparison of LLWNN and WNN for Mackey–Glass time-series prediction
Table 5
Comparison of test results of different models for the Mackey–Glass time-series problem
1.2
1
outputs and error
0.8
0.6
0.4
model output
target
0.2 error
-0.2
0 100 200 300 400 500 600 700 800 900 1000
time
Fig. 6. The actual time series data, output of the LLWNN model and the prediction error for training and
test samples using the hybrid training algorithm.
5. Discussion
With various wavelets used as activation functions and gradient descent based
training algorithms, the success has been demonstrated by the previous studies in
ARTICLE IN PRESS
462 Y. Chen et al. / Neurocomputing 69 (2006) 449–465
0.035
0.03
0.025
RMSE for training
0.02
0.015
WNN
0.01
LLWNN
0.005
0
0 2 4 6 8 10 12 14 16 18 20
Number of hidden nodes
Fig. 7. The performance of LLWNN and WNN in RMSE for training data set with different hidden
units.
conventional WNN [36]. However, a large number of basis function units should be
carefully determined for a high dimensional nonlinear problem.
The proposed LLWNN experiments demonstrate that only a few of wavelet basis
functions is needed for a given approximation/prediction problem with sufficient
accuracy. This is because the local linear models provide more power than a constant
weight. Moreover, the dilation and translation parameters of LLWNN are randomly
generated and optimized without predetermination.
The hybrid training algorithm of the ADLPSO and gradient descent method made
the parameter searching process more effective. Namely, the ADLPSO can quickly
and globally position the global minimum of objective function and then the
gradient descent algorithm gives accurate position of the minimum.
6. Conclusion
In this paper, a LLWNN was proposed. The characteristic of the network is that
the straightforward weight is replaced by a local linear model. The working process
of the proposed network can be viewed as to decompose the complex, nonlinear
system into a set of locally active submodels, then smoothly integrate those
submodels by their associated wavelet basis functions. One advantage of the
proposed method is that it needs only smaller wavelets for a given problem than the
common WNN. A fast and hybrid training algorithm ADLPSO is also introduced
ARTICLE IN PRESS
Y. Chen et al. / Neurocomputing 69 (2006) 449–465 463
for training the LLWNN. Simulation results for time-series prediction problem
showed the effectiveness of the proposed approach.
Acknowledgements
This research was partially supported the National High Technology Develop-
ment Program of China (863 Program) under contract number 2002AA4Z3240, and
The Provincial Science and Technology Development Program of Shandong under
Contract no SDSP2004-0720-03.
References
[1] G.E.P. Box, Time series analysis, forecasting and control, San Francisco, Holden Day, 1970.
[2] Y.H. Chen, et al., Evolving wavelet neural networks for system identification, Proceeding of
International Conference on Electrical Engineering, 2000, pp. 279–282.
[3] Y.H. Chen, et al., Evolving the basis function neural networks for system identification, Int. J. Adv.
Comput. Intell. 5 (4) (2001) 229–238.
[4] Y.H. Chen, et al., Nonlinear system modelling via optimal design of neural trees, Int. J. Neural Sys.
14 (2) (2004) 125–137.
[5] K.B. Cho, et al., Radial basis function based adaptive fuzzy systems their application to system
identification and prediction, Fuzzy Sets Syst. 83 (1995) 325–339.
[6] R.C. Eberhart, X. Hu, Human tremor analysis using particle swarm optimization, Proceedings of the
Congress on Evolutionary Computation, 1999, pp. 1927–1930.
[7] A.P. Engelbrecht, A. Ismail, Training product unit neural networks, Stability Control: Theory Appl.
2 (1–2) (1999) 59–74.
[8] B. Fischer, O. Nelles, R. Isermann, Adaptive predictive control of a heat exchanger based on a fuzzy
model, Control Eng. Practice 6 (1998) 259–269.
[9] B. Foss, T.A. Johansen, On local and fuzzy modelling, Proceedings of 3rd International Industrial
Fuzzy Control and Intelligent Systems, 1993, pp. 134–139.
[10] J.S.R. Jang, et al., Neuro-fuzzy and Soft Computing: a Computational Approach to Learning and
Machine Intelligence, Prentice-Hall, Upper Saddle River, NJ, 1997.
[11] N.K. Kasabov, et al., FuNN/2-A fuzzy neural network architecture for adaptive learning and
knowledge acquisition, Inform. Sci. 101 (1997) 155–175.
[12] S. Kawaji, Y.H. Chen, Evolving Neurofuzzy system by hybrid soft computing approaches for system
identification, Int. J. Adv. Comput. Intel. 5 (4) (2001) 229–238.
[13] J. Kennedy, Small worlds and mega-minds: effects of neighborhood topology on particle swarm
performance, Proceedings of the 1999 Congress of Evolutionary Computation, vol. 3, 1999, pp.
1931–1938.
[14] J. Kennedy, R.C. Eberhart, Particle swarm optimization, Proc. IEEE Int. Conf. Neural Networks IV
(1995) 1942–1948.
[15] D. Kim, et al., Forecasting time series with genetic fuzzy predictor ensembles, IEEE Trans. Fuzzy
Syst. 5 (1997) 523–535.
[16] J. Kim, et al., HyFIS: adaptive neuro-fuzzy inference systems and their application to nonlinear
dynamical systems, Neural Networks 12 (1999) 1301–1319.
[17] C.C. Lee, et al., A combined approach to fuzzy model identification, IEEE Trans. Syst. Man
Cybernet. 24 (1994) 736–744.
[18] Y. Lin, et al., A new approach to fuzzy-neural system modelling, IEEE Trans. Fuzzy Syst. 3 (1995)
190–198.
ARTICLE IN PRESS
464 Y. Chen et al. / Neurocomputing 69 (2006) 449–465
[19] K.S. Narendra, et al., Adaptive control using neural networks and approximation models, IEEE
Trans. Neural Networks 8 (3) (1997) 475–485.
[20] J. Nie, Constructing fuzzy model by self-organising counterpropagation network, IEEE Trans. Syst.
Man Cybernet. 25 (1995) 963–970.
[21] W. Pedtycz, An identification algorithm in fuzzy relational systems, Fuzzy Sets Syst. 13 (1984)
153–167.
[22] I. Rojas, et al., Time series analysis using normalized PG-RBF network with regression weights,
Neurocomputing 42 (2002) 167–285.
[23] Y. Shi, R.C. Eberhart, A modified particle swarm optimizer, IEEE International Conference of
Evolutionary Computation, May 1998, pp. 367–372.
[24] Y. Shi, R.C. Eberhart. Empirical study of particle swarm optimization, Proceedings of the Congress
on Evolutionary Computation, 1999, pp. 1945–1949.
[25] M. Sugeno, et al., Linguistic modelling based on numerical data, Proceedings of the IFSA’91, 1991,
pp. 234–247.
[26] H. Surmann, et al., Self-organising and genetic algorithm for an automatic design of fuzzy control
and decision systems, Proceedings of the FUFIT’s 93, 1993, pp. 1079–1104.
[27] R.M. Tong, The evaluation of fuzzy models derived from experimental data, Fuzzy Sets Syst. 4 (1980)
1–12.
[28] F. Van den Berg, Particle swarm weight initialization in multi-layer perceptron artificial neural
networks, in: D. Sha (ed.), Development and Practice of AI Techniques, Proceedings of the ICAI i
‘99, International Conference on Artificial Intelligence, Durban, September 1999, pp. 41–45.
[29] F. Van den Berg, A.P. Engelbrecht, Cooperative learning in neural networks using particle swarm
optimizers, S. Afr. Comp. J. (2000), pp. 84–90.
[30] L.X. Wang, et al., Generating fuzzy rules by learning from examples, IEEE Trans. Syst. Man
Cybernet. 22 (1992) 1414–1427.
[31] T. Wang, et al., A wavelet neural network for the approximation of nonlinear multivariable function,
Trans. Inst. Electr. Eng. C 102-C (2000) 185–193.
[32] Y. Xin, Evolving artificial neural networks, Proc. IEEE 87 (9) (1999) 1423–1447.
[33] C.W. Xu, Fuzzy model identification and self-learning for dynamic systems, IEEE Trans. Syst. Man
Cybernet. 17 (1987) 683–689.
[34] H. Yoshida, et al., A particle swarm optimization for reactive power and voltage control considering
voltage security assessment, IEEE Trans. Power Syst. 15 (4) (2000).
[35] G.P. Zhang, Time series forecasting using a hybrid ARIMA and neural network model,
Neurocomputing 50 (2003) 159–175.
[36] Q. Zhang, A. Benveniste, Wavelet Networks, IEEE Trans. Neural Networks 3 (6) (1992) 889–898.
Yuehui Chen was born in 1964. He received his B.Sc. degree in mathematics/
automatics from the Shandong University of China in 1985, and Ph.D. degree in
electrical engineering from the Kumamoto University of Japan in 2001. During
2001–2003, he had worked as the Senior Researcher of the Memory-Tech
Corporation at Tokyo. Since 2003 he has been a member at the Faculty of
Electrical Engineering in Jinan University, where he is currently head of the
Laboratory of Computational Intelligence. His research interests include evolu-
tionary computation, neural networks, fuzzy systems, hybrid computational
intelligence and their applications in time-series prediction, system identification
and intelligent control. He is the author and co-author of more than 60 papers.
Professor Yuehui Chen is a member of IEEE, the IEEE Systems, Man and Cybernetics Society and the
Computational Intelligence Society. He is also a member of the editorial boards of several technical
journals and a member of the program committee of several international conferences.
ARTICLE IN PRESS
Y. Chen et al. / Neurocomputing 69 (2006) 449–465 465
Jiwen Dong received his B.E. and M.E. degrees in Computer Science and
Automatics from the Wuhan University and the Wuhan University of Science
and Technology, China, in 1985 and 1995, respectively. He is currently a Ph.D.
student at the Wuhan University of Science and Technology. His research
interests include neural networks, fuzzy systems, evolutionary algorithms and
image processing. He is currently an associate professor and vice president of the
School of Information Science and Engineering of Jinan University.