Constructive Neural Networks: A Review: Sudhir Kumar Sharma
Constructive Neural Networks: A Review: Sudhir Kumar Sharma
Constructive Neural Networks: A Review: Sudhir Kumar Sharma
CONSTRUCTIVE NEURAL
NETWORKS: A REVIEW
Sudhir Kumar Sharma
Ansal Institute of Technology, GGS Indraprastha University
Gurgaon-1220033, Haryana, India
Pravin Chandra
Institute of Informatics & Communication, University of Delhi
South Campus, New Delhi, India
Abstract :
In conventional neural networks, we have to define the architecture prior to training but in constructive neural
networks the network architecture is constructed during the training process. In this paper, we review
constructive neural network algorithms that constructing feedforward architecture for regression problems.
Cascade-Correlation algorithm (CCA) is a well-known and widely used constructive algorithm. Cascade 2
algorithm is a variant of CCA that is found to be more suitable for regression problems and is reviewed in this
paper. We review our recently proposed two constructive algorithms that emphasize on architectural adaptation
and functional adaptation during training. To achieve functional adaptation, the slope of the sigmoidal function
is adapted during learning. The algorithm determines not only the optimum number of hidden layer nodes, as
also the optimum value of the slope parameter of sigmoidal function. The role of adaptive sigmoidal activation
function has been verified in constructive neural networks for better generalization performance and lesser
training time.
Keywords: Adaptive slope sigmoidal function; Constructive neural networks; Constructive algorithm;
Cascade 2 algorithm.
1. Introduction
Many types of neural network models have been proposed for function approximation (pattern classification and
regression problems). Among them the class of multilayer feedforward neural networks (FNNs) is the most
popular due to the flexibility in structure, good representational capabilities (universal approximators), and large
number of available training algorithms.
In general, the learning accuracy, the generalization ability and training time of supervised learning in FNNs
depend on various factors such as chosen network architecture (number of hidden nodes and connection
topology between nodes), the choice of activation function for each node, the choice of the optimization method
and the other training parameters (like learning rate, initial weights etc.). The architecture of the network is
either fixed empirically prior to training or is dynamically adjusted during training of the network for solving a
specific problem.
If the chosen network architecture is not appropriate for the fixed size network, then under-fitting or overfitting takes place. For better generalization performance and lesser training time, neither too small nor too large
network architecture is desirable. We need sufficient number of trainable parameters (weights, biases and
parameters associated with activation function) to capture the unknown mapping function from training data.
Single hidden layer FNNs (SLFNNs) with sufficient number of hidden nodes are universal approximators
(UAPs) i.e. these models are capable of approximating any continuous function to any desired degree of
accuracy [1], [2]. These results do not give any idea about the selection of optimum number of hidden nodes.
There are, however, a number of situations where two hidden layers have been more effective in terms of
generalization ability and training time. There are no known efficient methods for determining optimum
network architecture for a problem at hand. The selection of the optimal network architecture remains an open
problem.
The adaptive structure neural networks framework is a collection of a group of techniques in which network
structure is adapted during the training according to a given problem. The network structure adaptation may be
applicable to three levels namely, architecture adaptation, functional adaptation and training parameters
adaptation. These approaches can be classified into two different groups: evolutionary and non-evolutionary.
Many evolutionary algorithms have been proposed that evolve the network architecture together with
weights based on global optimization techniques, like genetic algorithms, genetic programming and
ISSN: 0975-5462
7847
Sudhir Kumar Sharma et al. / International Journal of Engineering Science and Technology
Vol. 2(12), 2010, 7847-7855
evolutionary strategies [3], [4]. The global search methods like ant colony optimization and particle swarm
optimization are widely used nowadays to determine optimum architecture during the learning [5], [6].
However, the evolutionary approach is quite demanding in both time and userdefined parameters [7].
2. Non-Evolutionary Adaptive Structure Neural Networks
Unlike conventional neural networks (NNs) algorithms that require the definition of the NN architecture before
training starts, adaptive structure neural networks enable the network architecture to be constructed along with
the training process.
Many methods have been many proposed to determine the optimal network architecture during training,
such as various constructive, pruning, constructive-pruning, and regularization algorithms. A constructive
algorithm adds hidden layers, nodes, and connections to a minimal NN architecture during training. A pruning
algorithm does the opposite, i.e., it deletes redundant hidden layers, nodes, and connections starting from larger
NN during training. A constructive-pruning algorithm is a hybrid approach in which the NN may be pruned after
completion of the constructive process or be interleaved with the constructive process. A regularization method
adds / subtracts a penalty term to the error function to be minimized /maximized so that the effect of
unimportant network connection weights are decreased in the trained network. The modified error function
term and and is a regularization parameter that controls the influence of the regularization term. The
difficulty of using such modified error function is in choosing a suitable regularization parameter, which often
requires trial-and-error. The regularization framework can be used with constructive and pruning algorithms [7],
[8].
Constructive algorithms have the following major advantages over the pruning algorithms:
(1) It is relatively easier to specify an initial network architecture in constructive algorithms, whereas in
pruning algorithms one usually does not know a priori how large the initial network should be. Therefore,
an initial network that is much larger than actually required by the underlying problem is usually chosen in
pruning algorithms, leading to a computational expensive network training process.
(2) Constructive algorithms tend to build small networks due to their incremental learning nature. Networks are
constructed that correspond to the complexity of the given problem, while overly large efforts may be spent
in pruning the redundant weights and hidden nodes contained in the network in pruning algorithms. Thus,
constructive algorithms are generally more economical (in terms of training time and network
complexity/structure) than pruning algorithms.
(3) In constructive algorithms, a smaller number of parameters (weights) is to be updated in the initial stage of
the training process thus requiring less training data for good generalization, while a sufficiently large
training data is required in pruning algorithms.
(4) One common feature in constructive algorithms is to assume that the hidden nodes already installed in the
network are useful in modeling part of the underlying function. In such case, the weights feeding into these
installed nodes can be frozen to avoid moving target problem. The number of weights to be optimized at a
time is reduced, so that time and memory requirements are decreased.
(5) In pruning algorithms and regularization methods, several problem dependant parameters need to be
properly specified or selected in order to obtain an acceptable network yielding satisfactory performance.
This requirement makes these algorithms more difficult to be used in real life applications.
3. Constructive Neural Networks
Constructive neural networks (CoNN) is a collection of a group of algorithms that alters the network structure
as learning proceeds, producing automatically a network with an appropriate size. The learning algorithms used
in CoNN are called constructive algorithms. Constructive algorithm starts with a minimal network architecture
and adds layers, nodes and connections during the training, as required by the given problem. The architecture
adaptation process is continued till the training algorithm finds a near optimal architecture that gives satisfactory
solution of the problem.
Six motivations for using constructive algorithms are listed with explanations in [Parekh et al., 2000]. These
are:
(1) Flexibility of exploring the space of neural network topologies
ISSN: 0975-5462
7848
Sudhir Kumar Sharma et al. / International Journal of Engineering Science and Technology
Vol. 2(12), 2010, 7847-7855
(2)
(3)
(4)
(5)
(6)
ISSN: 0975-5462
7849
Sudhir Kumar Sharma et al. / International Journal of Engineering Science and Technology
Vol. 2(12), 2010, 7847-7855
The addition of nodes in different hidden layers, however, is not straightforward. It is because one has to
decide whether a node will be added: in an existing hidden layer or in a new hidden layer. To tackle this issue,
most existing algorithms add a predefined and fixed number of nodes in the first hidden layer, then add the same
number of nodes in the second hidden layer, and so on [Ma and Khorasani, 2003; Monirul Islam et. al., 2009].
As aforementioned, this number is very crucial for the performance of NNs, and restricting it to a small value
limits the ability of a hidden layer to form complicated feature detection.
Any standard training algorithm based on local optimization methods for fixed size network architecture
may be used in conjunction with constructive approach for determining the optimum set of weights of the
network. The usual choice is local optimization method based on first-order gradient-descent method, like the
standard backpropagation algorithm [37] or its variants like the QuickProp algorithm [38], the RPROP
algorithm [39] etc., or the second-order (using the information of the Hessian matrix in some form or the other)
algorithms, like the quasi-Newton method [40] or the LevenbergMarquardt algorithm [41].
There are a variety of ways of training the resulting network after each hidden node addition in constructive
algorithms. These can be classified into two general methods. The first consists of training the whole network
after the addition of a new hidden node. The second consists of only training a newly added node, with the
remaining weights frozen. The method for adding a new hidden node is standard across many constructive
algorithms and in general consists of either adding a new hidden node when the error fails to meet a set amount
over a given period or testing for some criterion such as a local minimum. Halting network construction is
equivalent to finding the best model for a given problem, and hence techniques such as early stopping are
employed.
5. Cascade-Correlation algorithm and its variants
The cascade-correlation algorithm (CCA) is designed to overcome the local minima problem, the step-size
problem, the moving target problem in [Fahlman and Lebiere, 1990] and to avoid having to define the number
of hidden nodes in advance. CCA is widely used for classification and function approximation tasks.
CCA adds one hidden node in the cascade architecture at a time and hidden node is connected to all inputs
as well as previously trained hidden nodes. After the training of input weights of current hidden node gets
completed, it is connected to output nodes with input weights frozen and all inputs of output nodes are trained
again. In the following section many CCA variants and similar type constructive algorithms have been presented
[10].
5.1. Cascade 2 algorithm
Cascade 2 algorithm was also first proposed by Fahlman, who has implemented the CCA. Cascade 2 algorithm
differs from CCA by training a new hidden node to directly minimize the residual error rather than to
maximize the covariance between hidden node output and residual error at output nodes. Besides, hidden
node has adjustable output connections to all of the output nodes and all other things are common in both
algorithms. Several authors have demonstrated that CCA is effective for classification but not very successful
on regression tasks. This is due to the correlation term tending to drive the hidden node activations to their
extreme values, thereby, making it hard for the network to produce a smoothly varying output [31], [42], [43].
For the sake of clarity, we use flow-chart to describe the Cascade 2 algorithm as Fig. 1 with concrete
contents for each step are listed below:
(1) Step A:
(i)
(ii)
(iii)
(iv)
(2) Step B1:
Initializing NN
Create the training, testing set and fix the training parameters for a given problem.
Determining the number of nodes in the input and output layers according to the characteristic
of a given problem.
Input nodes are fully connected to output nodes.
Initializing all connection weight values.
Calculating the error over an epoch by
2
1 P No
d kp f kp
2 p 1 k 1
(1)
where d k p is the desired output and f k p is the actual output at k-th output node for p-th
pattern and P is the total number of exemplars. N o is the number of output nodes.
Judging whether the performance, which depends on training error, is smaller than pre-
ISSN: 0975-5462
7850
Sudhir Kumar Sharma et al. / International Journal of Engineering Science and Technology
Vol. 2(12), 2010, 7847-7855
specified error or not.
(4) Step B3:
Judging whether the overall (output nodes) stopping criterion, which depends on overall
patience (the percentage change in the network error required to continue training and the
length of patient time) is satisfied or not.
Updating all weights connected to the output nodes by gradient descent method, to minimize
the objective function described by (1).
Initializing candidate
Connect all input nodes and previously installed hidden nodes to the candidate and also
connect candidate to output nodes.
Initializing the new weights of the candidate.
Calculating the difference between the error of the output nodes and the input from the
candidate to these nodes defined as:
2
1 P No
ekp owknOn
2 p 1 k 1
(2)
where ek p is the residual error at k-th output node for the pattern p and owkn On is the input
from the n-th candidate node to the k-th output node. where On is the output of the n-th
candidate node and owkn is the connection weight from the n-th candidate node to k-th output
node.
Judging whether the local (adding hidden node) stopping criterion, which depends on local
patience (the percentage change in the network error required to continue training and the
length of patient time) is satisfied or not.
Updating all input and output weights of adding hidden node by gradient descent method, to
maximize the objective function described by (2), while the main NN is frozen.
Installing the trained candidate into the NN and this is the interface between B series and C
series.
All input connections to the candidate are frozen.
The output weights of the candidate are inserted with inverted sign.
S e p e y p
p
where
(3)
p ranges over the training patterns, y p is the activation of the candidate hidden node for the pattern p
e p is the residual error at output node for pattern p and e is the average error of an output node over all
patterns[32], [44].
ISSN: 0975-5462
7851
Sudhir Kumar Sharma et al. / International Journal of Engineering Science and Technology
Vol. 2(12), 2010, 7847-7855
ISSN: 0975-5462
7852
Sudhir Kumar Sharma et al. / International Journal of Engineering Science and Technology
Vol. 2(12), 2010, 7847-7855
the node addition. Lastly, they can be classified on the basis of as to how the connection weights are frozen
and again trained [34].
Prechelt, 1997, investigated problems and improvements in CCA. He developed six variants of CCA, one of
then was Cascade 2 algorithm. These variants were empirically compared using 42 different datasets from the
benchmark PROBEN1[31].
Thivierge et al, 2003 implement an algorithm that is simultaneously growing and pruning in cascade
networks. The pruning is done by removing irrelevant connections using the Optimal Brain Damage procedure
[45].
Islam and Murose, 2000 propose the cascade neural network design algorithm for two-hidden-layers FNNs.
The method automatically determines the number of nodes in each hidden layer and can also be reduced a twohidden-layer network to a single-hidden-layer network. It is based on the use of a temporary weight freezing
technique [46].
The fast constructive-covering algorithm for neural network construction proposed in [Wang, 2008] is based
on geometrical expansion. It has the advantage of each training example having to be learnt only once, which
allows the algorithm to work faster than traditional training algorithms [47].
6. Adaptive slope sigmoidal function constructive algorithms
There are five major issues involved in constructive algorithms for regression tasks. These issues are as follows:
(1) The choice of minimal architecture and network growing strategy: How to connect a new hidden node in
the existing network?
(2) The choice of activation function: Which activation function to use at the hidden and output nodes?
(3) The choice of weights freezing: Training the entire network or only newly added hidden node.
(4) The choice of optimization technique: Which optimization method to use to determine the optimum weights
during training?
(5) The choice of training stoppage criteria: When to stop the addition of new hidden nodes, or, in other words,
what is the optimal number of hidden nodes to be installed in the network?
The generalization ability and training time of constructive algorithms for regression tasks depend on each
choice discussed above.
In this section, we review our recently developed adaptive slope sigmoidal function constructive algorithms
[Sharma and Chandra, 2010a, Sharma and Chandra 2010b].
The number of nodes in the input and output layers is defined according to the characteristics of a given
problem. We start from a minimal SLFNN with one hidden node, where input and output nodes are not directly
connected. The algorithm starts from minimal architecture and during training one hidden node is added in the
current network at a time. The hidden node is added in the separate hidden layer that is connected to inputs,
output node as well as all previously added hidden nodes and thus constructs cascading architecture [37]. The
hidden node is added in the single hidden layer and thus constructs SLFNN dynamically [38].
In both algorithms, we used adaptive slope sigmoidal function at non-linear hidden nodes, defined as:
g x, b
1
;
1 e b x
(4)
where b is slope parameter and adapted in the same way that other weights are adapted during the training. The
activation function for very small slope values effectively behaves as a constant output function, thereby,
reducing the effect of the output to that of a threshold node (similar to the zero-th node of a layer). And, also for
large values of the slope, the functional map of the output effectively becomes equivalent to the step function.
We start slope parameter with a value of unity and update it so that it reaches its optimal value. To avoid the
saturation problem of log-sigmoid function and for best use of non-linearity, we restrict the slope parameter to
lie in the interval [0.1, 10].
In both the algorithms, the n-th hidden node is added in the current network in n-th iteration. The input and
output connection weights of the newly added node are only trained to further reduce the residual error. The
weight freezing is used here to make computation faster and to circumvent moving target problem.
In both the algorithms, we choose to update the input and output connection weights of newly added hidden
node, slope parameter of sigmoidal function along with the bias of output node by using gradient-descent
optimization method in sequential mode, minimizing the squared error objective function to further reduce the
residual error.
Each individual hidden node was trained up to a fixed number of epochs. The optimal number of hidden
nodes is selected on the basis of cross validation in the form of early stopping.
Due to adaptive slope sigmoidal function the step-size of weight-update problem is solved to some extent
and non-linear hidden nodes are prevented from going into saturation. Simulation results also indicate that the
ISSN: 0975-5462
7853
Sudhir Kumar Sharma et al. / International Journal of Engineering Science and Technology
Vol. 2(12), 2010, 7847-7855
convergence property, smoother learning and generalization performance of adaptive slope sigmoidal variant are
superior to the fixed-shape sigmoid variant [48], [49].
7. Conclusion
This paper presents an overview of non-evolutionary constructive neural networks. Constructive neural
networks are a collection of a group of methods, which enable the network architecture to be constructed along
with the training process. Although focusing on constructive algorithms that construct feedforward architecture
for regression problems. In general, a constructive algorithm has two integral components: pre-specified
network growing strategy and local optimization technique for updating weights during learning. The role of
adaptive sigmoidal activation function is justified in constructive neural networks for better generalization
performance and lesser training time.
8. References
[1]
Cybenko, G. (1989): Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signal and Systems 2:. 303314.
[2] Hornik, M. S. K; White, H. (1989): Multilayer feedforward networks are universal approximators. Neural Networks, vol. 2(5), 359366.
[3] Koza, J. R.; Rice, J. P. (1991): Genetic generation of both the weights and architecture for a neural network. in Proc. IEEE, IJCNN,
Seattle, WA, 1991, vol. 2, 397-404.
[4] Yao, X.; Liu, Y. (1997): A new evolutionary system for evolving artificial neural networks. Transactions on Neural Networks, vol. 8,
no. 3, 694-713.
[5] Wei, G. (2008): Evolutionary Neural Network Based on New Ant Colony Algorithm. International symposium on Computational
Intelligence and Design IEEE, 318-321.
[6] Huang, R.; Tong, S. (2009): Evolving Product Unit Neural Networks with Particle Swarm Optimization. Fifth International
Conference on Image and Graphics, IEEE Computer Society.
[7] Kwok, T. Y.; Yeung, D. Y. (1997a): Constructive Algorithms for Structure Learning in feedforward Neural Networks for Regression
Problems. IEEE Transactions on Neural Networks, 8 (3), 630-645.
[8] Reed, R. (1993): Pruning algorithms-A Survey. IEEE Transactions on Neural Networks, vol. 4, 740-747.
[9] Parekh, R.; Yang, J.; Honavar, V. (2000): Constructive neural-network learning algorithms for pattern classification. IEEE Transaction
on Neural Networks, vol. 11, no. 2, pp. 436-451.
[10] Fahlman, S. E.; Lebiere, C. (1990): The cascade correlation learning architecture. Advances in Neural Information Processing System
2, D. S. Touretzky, Ed. CA: Morgan Kaufmann, 524-277.
[11] Ash, T. (1989): Dynamic node creation in backpropagation networks. Connection Science, vol. 1, no. 4, 365-375.
[12] Friedman, J. H.; Stuetzle, W. (1981): Projection pursuit regression. J. Amer. Statist. Assoc., vol. 76, no. 376, pp. 817823.
[13] Platt, J. (1991): A resource-allocating network for function interpolation. Neural Computation, vol. 3, pp. 213-225.
[14] Farlow, S. J. Eds. (1984): Self-Organizing Methods in Modeling: GMDH Type Algorithms, vol. 54 of Statistics: Textbooks and
Monographs. New York: Marcel Dekker.
[15] Nabhan, T. M.; Zomaya A. Y. (1994): Toward generating neural network structures for function approximation. Neural Networks, vol.
7, no. 1, pp. 89-90.
[16] Gallant, S. I. (1986): Three constructive algorithms for network learning. in IEEE proc. 8th conf. on Pattern Recognition, pages 849852.
[17] Mezard, M.; Nadal, J. P. (1989): Learning in feedforward layered networks: The Tiling algorithm. Journal of Physics A: Math. Gen.,
vol. 22, no. 12, pp. 2191- 2203.
[18] Frean, M (1990): The Upstart algorithm: A method for constructing and training feed-forward neural networks. Neural Networks, vol.
2, pp. 198-209.
[19] Burgess, N. (1994): A constructive algorithm that converges for real-valued input patterns. Int. journal of Neural Systems 5(1), pp. 5966.
[20] Parekh, R.; Yang, J.; Honavar, V. (1997): Constructive Neural network Learning Algorithms for multi-category pattern classification.
in Artificial Intelligence Research Group, Department of Computer Science, 26 Atanasoff Hall, Iowa Sate University, Ames, Iowa ,
USA.
[21] Nicoletti, M. C.; Bertini, J. R (2007): An empirical evaluation of constructive neural network algorithms in classification tasks. in Int.
J. Innovative Computing and Applications, vol. 1, no.1
[22] Nicoletti, M. C. et al. (2009): Constructive Neural Network Algorithm for feedforward Architectures Suitable for Classification Tasks.
pp. 1-23, In Leonardo Franco etc. (Eds.): Constructive Neural Networks (Studies in Computational Intelligence vol. 258), Springer.
[23] Marchand, M. et al., (1990): A convergence theorem for sequential learning in two layer perceptrons. Europhysics Letters 11(6), pp.
487-492.
[24] Young, S.; Downs, T. (1998): CARVE- A constructive algorithm for real valued examples. in IEEE Transactions on Neural
Networks, vol. 9, no. 6.
ISSN: 0975-5462
7854
Sudhir Kumar Sharma et al. / International Journal of Engineering Science and Technology
Vol. 2(12), 2010, 7847-7855
[25] Campbell, C.; Vicente, C. P. (1995): Constructing feed-forward neural networks for binary classification tasks. Advanced Computing
Research Centre, Bristol University, United Kingdom.
[26] Mascioli, F. M. F.; Martinelli, G. (1995): A Constructive algorithm for binary neural networks: the oil-spot algorithm. IEEE
Transaction on Neural Networks, 6, pp. 794-797.
[27] Draghici, S. (2001): The Constraint based decomposition (CBD) training architecture. Neural Networks, 14, pp. 527-550.
[28] Subirats, J. L.; Jerez, J. M. ; Franco L. (2008): A new decomposition algorithm for threshold synthesis and generalization of Boolean
functions. IEEE Transaction on Circuits and Systems I 55, pp. 3188-3196.
[29] Setiono, R.; Hui, L. C. K (1995): Use of a Quasi-Newton Method in a Feedforward Neural Network Construction Algorithm. IEEE
Transactions on Neural Networks, vol. 6, no. 1.
[30] Kwok, T. Y.; Yenug D. Y. (1997b): Objective functions for training new hidden units in constructive neural networks. IEEE
Transactions on Neural Networks, vol. 8, no. 5, 1131-1148
[31] Prechelt, P. (1997): Investigation of the cascor family of learning algorithms. Neural Networks 10 (5), pp 885-896.
[32] Lahnajarvi, J. J. T.; Lehtokangas, M. I.; Saarinen, J. P. P. (1999): Fixed Cascade Error-A Novel Constructive Neural network for
Structure Learning. Proceedings of the Artificial Neural Networks in Engineering Conference, ANNIE99, St. Louis, Missouri, USA.
[33] Lehtokangas, M. (1999): Modeling with constructive backpropagation. Neural Networks, vol. 12, 707-716.
[34] Treadgold, N. K.; Gedeon, T. D. (1997): A Cascade Network employing Progressive RPROP. International conference on Artificial
and Natural Neural Networks, pp. 733-742.
[35] Ma, L.; Khorasani, K. (2003): A new strategy for adaptively constructing multiplayer feedforward neural networks. Neurocomputing
51, pp 361-385.
[36] Islam, M. M. et al. (2009): A New Adaptive Merging and Growing Algorithm for Designing Artificial Neural Networks. IEEE
Transactions on Systems, Man, and Cybernetics- Part B: Cybernetics, vol. 39, no. 3.
[37] Rumelhart, D. E.; Hinton, G. E.; Williams R. J. (1986): Learning internal representations by error propagation. Parallel Distributed
Processing, vol. I, D. E. Rumelhart and J. L. McClelland, Eds. Cambridge, MA: MIT Press, 318-362.
[38] Fahlman, S. E. (1989): An empirical study of learning speed in backpropagation networks. Carnegie Mellon Univ., Pittsburg, PA,
Tech. Rep. CMU-CS-88-162.
[39] Riedmiller, M.; Braun, H. (1993): A direct adaptive method for faster backpropagation learning: The RPROP Algorithm. Proc. of the
IEEE Int. Conf. on Neural Networks, San Francisco, CA, 586-591
[40] Setiono, R.; Hui L. C. K. (1995): Use of a Quasi-Newton Method in a Feedforward Neural Network Construction Algorithm. IEEE
Transactions on Neural Networks, vol. 6, no. 1
[41] Hagan, M. T.; Menhaj, M. B. (1994): Training Feedforward Networks with the Marquardt algorithm. IEEE Transactions on Neural
Networks, vol. 5, no. 6, 989- 993
[42] Nechyba, M. C.; Xu, Y. (1994): Neural network approach to control system identification with variable activation functions. IEEE
International Symposium on Intelligent Control, Columbus, Ohio, USA.
[43] Hwang, J. N.; Shien, S.; Lay, S. R. (1996): The Cascade Correlation Learning: A Projection Pursuit Learning Perspective. IEEE
Transactions on Neural Networks, vol. 7, no. 2.
[44] Lahnajarvi, J. J. T.; Lehtokangas, M. I.; Saarinen, J. P. P. (2002): Evaluation of constructive neural networks with cascaded
architectures. Neurocomputing, vol. 48, pp. 573-607.
[45] Thivierge, J. -P.; Rivest F.; Shultz, T. R. (2003): A dual-phase technique for pruning constructive networks. In proceedings of the
IEEE International Joint Conference on Neural Networks, vol. 1, pp. 559-564.
[46] Islam, M.; Murase, K (2000): A new algorithm to design compact two-hidden-layer artificial neural networks,Neural Networks, vol.
14, no. 9, pp. 1265-1278.
[47] Wang, D. (2008): Fast constructive-covering algorithm for neural networks and its implementation in classification. In Applied
SoftComputing 8, pp. 166-173.
[48] Sharma, S. K.; Chandra, P. (2010a): An adaptive slope sigmoidal function cascading neural networks algorithm. Proc. of the IEEE,
Third International Conference on Emerging Trends in Engineering and Technology (ICETET 2010), India, pp. 139-144, doi:
10.1109/ICETET.2010.71.
[49] Sharma, S. K.; Chandra. P. (2010b): An adaptive slope basic dynamic node creation algorithm for single hidden layer neural networks.
Proc. of the IEEE, International Conference on Computational Intelligence and Communication Systems (CICN 2010), India, pp. 531539, doi: 10.1109/CICN.2010.38.
ISSN: 0975-5462
7855