Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Constructive Neural Networks: A Review: Sudhir Kumar Sharma

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Sudhir Kumar Sharma et al.

/ International Journal of Engineering Science and Technology


Vol. 2(12), 2010, 7847-7855

CONSTRUCTIVE NEURAL
NETWORKS: A REVIEW
Sudhir Kumar Sharma
Ansal Institute of Technology, GGS Indraprastha University
Gurgaon-1220033, Haryana, India

Pravin Chandra
Institute of Informatics & Communication, University of Delhi
South Campus, New Delhi, India

Abstract :
In conventional neural networks, we have to define the architecture prior to training but in constructive neural
networks the network architecture is constructed during the training process. In this paper, we review
constructive neural network algorithms that constructing feedforward architecture for regression problems.
Cascade-Correlation algorithm (CCA) is a well-known and widely used constructive algorithm. Cascade 2
algorithm is a variant of CCA that is found to be more suitable for regression problems and is reviewed in this
paper. We review our recently proposed two constructive algorithms that emphasize on architectural adaptation
and functional adaptation during training. To achieve functional adaptation, the slope of the sigmoidal function
is adapted during learning. The algorithm determines not only the optimum number of hidden layer nodes, as
also the optimum value of the slope parameter of sigmoidal function. The role of adaptive sigmoidal activation
function has been verified in constructive neural networks for better generalization performance and lesser
training time.
Keywords: Adaptive slope sigmoidal function; Constructive neural networks; Constructive algorithm;
Cascade 2 algorithm.
1. Introduction
Many types of neural network models have been proposed for function approximation (pattern classification and
regression problems). Among them the class of multilayer feedforward neural networks (FNNs) is the most
popular due to the flexibility in structure, good representational capabilities (universal approximators), and large
number of available training algorithms.
In general, the learning accuracy, the generalization ability and training time of supervised learning in FNNs
depend on various factors such as chosen network architecture (number of hidden nodes and connection
topology between nodes), the choice of activation function for each node, the choice of the optimization method
and the other training parameters (like learning rate, initial weights etc.). The architecture of the network is
either fixed empirically prior to training or is dynamically adjusted during training of the network for solving a
specific problem.
If the chosen network architecture is not appropriate for the fixed size network, then under-fitting or overfitting takes place. For better generalization performance and lesser training time, neither too small nor too large
network architecture is desirable. We need sufficient number of trainable parameters (weights, biases and
parameters associated with activation function) to capture the unknown mapping function from training data.
Single hidden layer FNNs (SLFNNs) with sufficient number of hidden nodes are universal approximators
(UAPs) i.e. these models are capable of approximating any continuous function to any desired degree of
accuracy [1], [2]. These results do not give any idea about the selection of optimum number of hidden nodes.
There are, however, a number of situations where two hidden layers have been more effective in terms of
generalization ability and training time. There are no known efficient methods for determining optimum
network architecture for a problem at hand. The selection of the optimal network architecture remains an open
problem.
The adaptive structure neural networks framework is a collection of a group of techniques in which network
structure is adapted during the training according to a given problem. The network structure adaptation may be
applicable to three levels namely, architecture adaptation, functional adaptation and training parameters
adaptation. These approaches can be classified into two different groups: evolutionary and non-evolutionary.
Many evolutionary algorithms have been proposed that evolve the network architecture together with
weights based on global optimization techniques, like genetic algorithms, genetic programming and

ISSN: 0975-5462

7847

Sudhir Kumar Sharma et al. / International Journal of Engineering Science and Technology
Vol. 2(12), 2010, 7847-7855
evolutionary strategies [3], [4]. The global search methods like ant colony optimization and particle swarm
optimization are widely used nowadays to determine optimum architecture during the learning [5], [6].
However, the evolutionary approach is quite demanding in both time and userdefined parameters [7].
2. Non-Evolutionary Adaptive Structure Neural Networks
Unlike conventional neural networks (NNs) algorithms that require the definition of the NN architecture before
training starts, adaptive structure neural networks enable the network architecture to be constructed along with
the training process.
Many methods have been many proposed to determine the optimal network architecture during training,
such as various constructive, pruning, constructive-pruning, and regularization algorithms. A constructive
algorithm adds hidden layers, nodes, and connections to a minimal NN architecture during training. A pruning
algorithm does the opposite, i.e., it deletes redundant hidden layers, nodes, and connections starting from larger
NN during training. A constructive-pruning algorithm is a hybrid approach in which the NN may be pruned after
completion of the constructive process or be interleaved with the constructive process. A regularization method
adds / subtracts a penalty term to the error function to be minimized /maximized so that the effect of
unimportant network connection weights are decreased in the trained network. The modified error function

looks like E W E W R W , where, E W is the training error function, R W is the regularization

term and and is a regularization parameter that controls the influence of the regularization term. The
difficulty of using such modified error function is in choosing a suitable regularization parameter, which often
requires trial-and-error. The regularization framework can be used with constructive and pruning algorithms [7],
[8].
Constructive algorithms have the following major advantages over the pruning algorithms:
(1) It is relatively easier to specify an initial network architecture in constructive algorithms, whereas in
pruning algorithms one usually does not know a priori how large the initial network should be. Therefore,
an initial network that is much larger than actually required by the underlying problem is usually chosen in
pruning algorithms, leading to a computational expensive network training process.
(2) Constructive algorithms tend to build small networks due to their incremental learning nature. Networks are
constructed that correspond to the complexity of the given problem, while overly large efforts may be spent
in pruning the redundant weights and hidden nodes contained in the network in pruning algorithms. Thus,
constructive algorithms are generally more economical (in terms of training time and network
complexity/structure) than pruning algorithms.
(3) In constructive algorithms, a smaller number of parameters (weights) is to be updated in the initial stage of
the training process thus requiring less training data for good generalization, while a sufficiently large
training data is required in pruning algorithms.
(4) One common feature in constructive algorithms is to assume that the hidden nodes already installed in the
network are useful in modeling part of the underlying function. In such case, the weights feeding into these
installed nodes can be frozen to avoid moving target problem. The number of weights to be optimized at a
time is reduced, so that time and memory requirements are decreased.
(5) In pruning algorithms and regularization methods, several problem dependant parameters need to be
properly specified or selected in order to obtain an acceptable network yielding satisfactory performance.
This requirement makes these algorithms more difficult to be used in real life applications.
3. Constructive Neural Networks
Constructive neural networks (CoNN) is a collection of a group of algorithms that alters the network structure
as learning proceeds, producing automatically a network with an appropriate size. The learning algorithms used
in CoNN are called constructive algorithms. Constructive algorithm starts with a minimal network architecture
and adds layers, nodes and connections during the training, as required by the given problem. The architecture
adaptation process is continued till the training algorithm finds a near optimal architecture that gives satisfactory
solution of the problem.
Six motivations for using constructive algorithms are listed with explanations in [Parekh et al., 2000]. These
are:
(1) Flexibility of exploring the space of neural network topologies

ISSN: 0975-5462

7848

Sudhir Kumar Sharma et al. / International Journal of Engineering Science and Technology
Vol. 2(12), 2010, 7847-7855
(2)
(3)
(4)
(5)
(6)

Potential for matching the intrinsic complexity of the learning task


Estimation of expected case complexity of the learning task
Tradeoffs among performance measures
Incorporation of prior knowledge
Lifelong learning.
Mostly function approximation algorithms can solve both kinds of classification and regression problems,
however, efficacy may be dependent on the type of problem. Classification problems can be seen as a special
case of regression problems, where the discrete outputs are only allowed. This is why all algorithms for
regression problems can also be used for classification problems while the reverse is not always true.
Kwok and Yenug, 1997a survey the major constructive algorithms for regression problems. In their
proposed taxonomy, based on the perspective of a state-space search, they group the algorithms into six
different categories, each named after its most representative algorithm, as follows:
(1) Cascade-Correlation algorithm (CCA) that mostly groups variants of the Cascade architecture proposed by
Fahlman & Lebiere, 1990 [10].
(2) Dynamic node creation (DNC) algorithm proposed by Ash, 1989 [11].
(3) Projection pursuit regression, based on the statistical technique proposed by Friedman & Stuetzle, 1981[12].
(4) Resource-allocating network proposed by Platt, 1991 [13].
(5) Group method of data handling, a class of algorithms inspired by the GMDH proposed by Ivakhnenko and
described in Farlow, 1984 [14].
(6) Hybrid algorithm, that employs both, a constructive and pruning-strategy proposed by Nabhan &
Zomaya,1994 [15].
Among these, the most popular for function approximation problems is the CCA and the next in popularity
is DNC. The latter algorithm constructs a SLFNN, whereas the former constructs cascade architecture.
Many CoNN algorithms suitable only for classification problems have been proposed in neural network
literature. The well known CoNN algorithms for two-class classification are the Tower and Pyramid [Gallant,
1986], the Tiling [Mezard and Nadal, 1989], the Upstart [Fren, 1990] and the Perceptron Cascade [Burgess,
1994]. The multi-class version of these algorithms is the MTower, MPyramid, Mtiling, MUpstart and
MPerceptron Cascade. These algorithms can be seen in [Parakeh et al., 1997; Parakeh et al., 2000].
Nicoletti and Bertini, 2007 evaluated empirically several two-class and multi-class CoNN algorithms.
Nicoletti, et. al., 2009 reviewed several well-known CoNN algorithms suitable for classification tasks that of
constructing feedforward architectures as a result of adaptive structured learning. They classified the algorithms
into two groups: some directed by the minimization of classification errors and some others based on a
sequential learning model. The algorithms above are based on minimization of classification errors. The CoNN
algorithms based on the sequential learning model add hidden node and train as partial classifier during training.
The well-known CoNN algorithms for this group are Irregular Partitioning Algorithm [Marchand et al., 1990],
the Carve algorithm [Young and Downs, 1998], the Target Switch algorithm [Campbell and Vicente, 1995], the
Oil Spot algorithm [Mascioli and Martinelli, 1995], the Constraint based decomposition algorithm [Draghici,
2001], and the Decomposition Algorithm for Synthesis and generalization [Subirats et al., 2008]. Many recently
proposed CoNN algorithms for classification tasks are found in [22, see for further references]
4. Constructive Algorithms for Feedforward Neural Networks
In general, the constructive algorithms comprise three common resulting architectures as SLFNN, and
multilayer FNN and cascade architecture as a result of structured learning. For example, a constructive
algorithm starts SLFNN with one or zero node in a hidden layer and then adds one or few hidden nodes
iteratively in each step in the current network. The node(s) is/are added in the same hidden layer for designing
SLFNNs or different hidden layers for designing multilayer FNNs. The cascade architecture is a special class of
multilayer FNN, with one node in each hidden layer and each hidden node receiving inputs from network inputs
and previously added hidden nodes.
DNC algorithm is probably the first-ever constructive algorithm for designing SLFNNs dynamically. In
DNC algorithm hidden nodes with sigmoid activation function are added in SLFNN iteratively. The whole
network or only the weights associated with newly added node is trained after each addition step. A large number
of constructive algorithms following the DNC have been developed, e.g., [29], [30].
The cascade architecture was first proposed by Fahlman and Lebiere, 1990 as a cascade-correlation
algorithm (CCA). Since the hidden nodes in cascade architecture receive additional information from some
nonlinear combination of the inputs (implemented by previous hidden nodes), these nodes are termed higherorder nodes and are capable of performing a more complex function of the input variables. Several variants of
CCA and similar type of algorithms have been proposed in the literature [31]-[34].

ISSN: 0975-5462

7849

Sudhir Kumar Sharma et al. / International Journal of Engineering Science and Technology
Vol. 2(12), 2010, 7847-7855
The addition of nodes in different hidden layers, however, is not straightforward. It is because one has to
decide whether a node will be added: in an existing hidden layer or in a new hidden layer. To tackle this issue,
most existing algorithms add a predefined and fixed number of nodes in the first hidden layer, then add the same
number of nodes in the second hidden layer, and so on [Ma and Khorasani, 2003; Monirul Islam et. al., 2009].
As aforementioned, this number is very crucial for the performance of NNs, and restricting it to a small value
limits the ability of a hidden layer to form complicated feature detection.
Any standard training algorithm based on local optimization methods for fixed size network architecture
may be used in conjunction with constructive approach for determining the optimum set of weights of the
network. The usual choice is local optimization method based on first-order gradient-descent method, like the
standard backpropagation algorithm [37] or its variants like the QuickProp algorithm [38], the RPROP
algorithm [39] etc., or the second-order (using the information of the Hessian matrix in some form or the other)
algorithms, like the quasi-Newton method [40] or the LevenbergMarquardt algorithm [41].
There are a variety of ways of training the resulting network after each hidden node addition in constructive
algorithms. These can be classified into two general methods. The first consists of training the whole network
after the addition of a new hidden node. The second consists of only training a newly added node, with the
remaining weights frozen. The method for adding a new hidden node is standard across many constructive
algorithms and in general consists of either adding a new hidden node when the error fails to meet a set amount
over a given period or testing for some criterion such as a local minimum. Halting network construction is
equivalent to finding the best model for a given problem, and hence techniques such as early stopping are
employed.
5. Cascade-Correlation algorithm and its variants
The cascade-correlation algorithm (CCA) is designed to overcome the local minima problem, the step-size
problem, the moving target problem in [Fahlman and Lebiere, 1990] and to avoid having to define the number
of hidden nodes in advance. CCA is widely used for classification and function approximation tasks.
CCA adds one hidden node in the cascade architecture at a time and hidden node is connected to all inputs
as well as previously trained hidden nodes. After the training of input weights of current hidden node gets
completed, it is connected to output nodes with input weights frozen and all inputs of output nodes are trained
again. In the following section many CCA variants and similar type constructive algorithms have been presented
[10].
5.1. Cascade 2 algorithm
Cascade 2 algorithm was also first proposed by Fahlman, who has implemented the CCA. Cascade 2 algorithm
differs from CCA by training a new hidden node to directly minimize the residual error rather than to
maximize the covariance between hidden node output and residual error at output nodes. Besides, hidden
node has adjustable output connections to all of the output nodes and all other things are common in both
algorithms. Several authors have demonstrated that CCA is effective for classification but not very successful
on regression tasks. This is due to the correlation term tending to drive the hidden node activations to their
extreme values, thereby, making it hard for the network to produce a smoothly varying output [31], [42], [43].
For the sake of clarity, we use flow-chart to describe the Cascade 2 algorithm as Fig. 1 with concrete
contents for each step are listed below:
(1) Step A:
(i)
(ii)
(iii)
(iv)
(2) Step B1:

Initializing NN
Create the training, testing set and fix the training parameters for a given problem.
Determining the number of nodes in the input and output layers according to the characteristic
of a given problem.
Input nodes are fully connected to output nodes.
Initializing all connection weight values.
Calculating the error over an epoch by
2

(3) Step B2:

1 P No
d kp f kp
2 p 1 k 1

(1)

where d k p is the desired output and f k p is the actual output at k-th output node for p-th
pattern and P is the total number of exemplars. N o is the number of output nodes.
Judging whether the performance, which depends on training error, is smaller than pre-

ISSN: 0975-5462

7850

Sudhir Kumar Sharma et al. / International Journal of Engineering Science and Technology
Vol. 2(12), 2010, 7847-7855
specified error or not.
(4) Step B3:

(5) Step B4:


(6) Step C1:
(i)
(ii)
(7) Step C2:

Judging whether the overall (output nodes) stopping criterion, which depends on overall
patience (the percentage change in the network error required to continue training and the
length of patient time) is satisfied or not.
Updating all weights connected to the output nodes by gradient descent method, to minimize
the objective function described by (1).
Initializing candidate
Connect all input nodes and previously installed hidden nodes to the candidate and also
connect candidate to output nodes.
Initializing the new weights of the candidate.
Calculating the difference between the error of the output nodes and the input from the
candidate to these nodes defined as:
2

(8) Step C3:

(9) Step C4:


(10) Step BC:
(i)
(ii)

1 P No
ekp owknOn
2 p 1 k 1

(2)

where ek p is the residual error at k-th output node for the pattern p and owkn On is the input
from the n-th candidate node to the k-th output node. where On is the output of the n-th
candidate node and owkn is the connection weight from the n-th candidate node to k-th output
node.
Judging whether the local (adding hidden node) stopping criterion, which depends on local
patience (the percentage change in the network error required to continue training and the
length of patient time) is satisfied or not.
Updating all input and output weights of adding hidden node by gradient descent method, to
maximize the objective function described by (2), while the main NN is frozen.
Installing the trained candidate into the NN and this is the interface between B series and C
series.
All input connections to the candidate are frozen.
The output weights of the candidate are inserted with inverted sign.

5.2. Modified CCA using different objective functions


There are many ways to modify the CCA algorithm. One of them is to change the objective function for
training hidden node. Kwok and Yenug, 1997b conducted a very careful investigation on the objective
functions for training hidden nodes in constructive algorithms for regression tasks, aiming at deriving a class of
objective functions whose value and corresponding weight update could be in O(P) time, for a training set with
P patterns. Any objective function may be used in place of covariance objective function in CCA[30].

5.3.Fixed cascade error


CCA has also inspired the proposal of the Fixed Cascade Error, described in [Lahnajarvi et al., 1999 c;
Lahnajarvi et al., 2002], while the general structure of both algorithms is the same, they differ in the way the
hidden nodes are created. The candidate-hidden nodes are trained to maximize the following objective function for
single output node.

S e p e y p
p

where

(3)

p ranges over the training patterns, y p is the activation of the candidate hidden node for the pattern p

e p is the residual error at output node for pattern p and e is the average error of an output node over all
patterns[32], [44].

ISSN: 0975-5462

7851

Sudhir Kumar Sharma et al. / International Journal of Engineering Science and Technology
Vol. 2(12), 2010, 7847-7855

Fig. 1. Cascade 2 algorithm

5.4. Modeling with constructive backpropagation


Lehtokangas, 1999 proposed a constructive backpropagation (CBP) that is a similar approach to CCA. CBP
is computationally just as efficient as the CCA even though there is a need to back-propagate the error through
no more than one hidden layer. Further, CBP has the same constructive benefits as CCA, but in addition CBP
benefits from simpler implementation and the ability to utilize stochastic optimization routines.
Moreover, it is shown as to how CBP could be extended to allow addition of multiple new nodes
simultaneously and how it can be used to perform continuous automatic structure adaptation. This includes both
addition and deletion of nodes in batches. The performance of CBP learning is studied with time series
modeling experiments, which demonstrate that CBP can provide significantly better modeling capabilities
compared to CCA learning [33].
5.5. A cascade network employing progressive RPROP
Treadgold and Gedeon, 1997 proposed a new cascade network algorithm employing Progressive RPROP
(CasPer). CasPer is a constructive learning algorithm, which builds cascade network that adds new hidden
nodes to the NN, and then trains the whole network with different step-sizes for different parts of the network,
instead of using input weight freezing of current hidden node. CasPer uses a variation of RPROP to train the
whole network. CasPer is known to produce more compact networks with very promising results.
All the variants of CCA and similar type algorithms discussed above differ from each other in various
aspects, the connectivity patterns of the new hidden node i.e. cascade architecture or SLFNN, activation
function used at non linear hidden nodes, objective function used for hidden node training, the algorithm used
for training the individual hidden node, the stopping criteria for candidate node training and halting criteria for

ISSN: 0975-5462

7852

Sudhir Kumar Sharma et al. / International Journal of Engineering Science and Technology
Vol. 2(12), 2010, 7847-7855
the node addition. Lastly, they can be classified on the basis of as to how the connection weights are frozen
and again trained [34].
Prechelt, 1997, investigated problems and improvements in CCA. He developed six variants of CCA, one of
then was Cascade 2 algorithm. These variants were empirically compared using 42 different datasets from the
benchmark PROBEN1[31].
Thivierge et al, 2003 implement an algorithm that is simultaneously growing and pruning in cascade
networks. The pruning is done by removing irrelevant connections using the Optimal Brain Damage procedure
[45].
Islam and Murose, 2000 propose the cascade neural network design algorithm for two-hidden-layers FNNs.
The method automatically determines the number of nodes in each hidden layer and can also be reduced a twohidden-layer network to a single-hidden-layer network. It is based on the use of a temporary weight freezing
technique [46].
The fast constructive-covering algorithm for neural network construction proposed in [Wang, 2008] is based
on geometrical expansion. It has the advantage of each training example having to be learnt only once, which
allows the algorithm to work faster than traditional training algorithms [47].
6. Adaptive slope sigmoidal function constructive algorithms
There are five major issues involved in constructive algorithms for regression tasks. These issues are as follows:
(1) The choice of minimal architecture and network growing strategy: How to connect a new hidden node in
the existing network?
(2) The choice of activation function: Which activation function to use at the hidden and output nodes?
(3) The choice of weights freezing: Training the entire network or only newly added hidden node.
(4) The choice of optimization technique: Which optimization method to use to determine the optimum weights
during training?
(5) The choice of training stoppage criteria: When to stop the addition of new hidden nodes, or, in other words,
what is the optimal number of hidden nodes to be installed in the network?
The generalization ability and training time of constructive algorithms for regression tasks depend on each
choice discussed above.
In this section, we review our recently developed adaptive slope sigmoidal function constructive algorithms
[Sharma and Chandra, 2010a, Sharma and Chandra 2010b].
The number of nodes in the input and output layers is defined according to the characteristics of a given
problem. We start from a minimal SLFNN with one hidden node, where input and output nodes are not directly
connected. The algorithm starts from minimal architecture and during training one hidden node is added in the
current network at a time. The hidden node is added in the separate hidden layer that is connected to inputs,
output node as well as all previously added hidden nodes and thus constructs cascading architecture [37]. The
hidden node is added in the single hidden layer and thus constructs SLFNN dynamically [38].
In both algorithms, we used adaptive slope sigmoidal function at non-linear hidden nodes, defined as:

g x, b

1
;
1 e b x

(4)
where b is slope parameter and adapted in the same way that other weights are adapted during the training. The
activation function for very small slope values effectively behaves as a constant output function, thereby,
reducing the effect of the output to that of a threshold node (similar to the zero-th node of a layer). And, also for
large values of the slope, the functional map of the output effectively becomes equivalent to the step function.
We start slope parameter with a value of unity and update it so that it reaches its optimal value. To avoid the
saturation problem of log-sigmoid function and for best use of non-linearity, we restrict the slope parameter to
lie in the interval [0.1, 10].
In both the algorithms, the n-th hidden node is added in the current network in n-th iteration. The input and
output connection weights of the newly added node are only trained to further reduce the residual error. The
weight freezing is used here to make computation faster and to circumvent moving target problem.
In both the algorithms, we choose to update the input and output connection weights of newly added hidden
node, slope parameter of sigmoidal function along with the bias of output node by using gradient-descent
optimization method in sequential mode, minimizing the squared error objective function to further reduce the
residual error.
Each individual hidden node was trained up to a fixed number of epochs. The optimal number of hidden
nodes is selected on the basis of cross validation in the form of early stopping.
Due to adaptive slope sigmoidal function the step-size of weight-update problem is solved to some extent
and non-linear hidden nodes are prevented from going into saturation. Simulation results also indicate that the

ISSN: 0975-5462

7853

Sudhir Kumar Sharma et al. / International Journal of Engineering Science and Technology
Vol. 2(12), 2010, 7847-7855
convergence property, smoother learning and generalization performance of adaptive slope sigmoidal variant are
superior to the fixed-shape sigmoid variant [48], [49].

7. Conclusion
This paper presents an overview of non-evolutionary constructive neural networks. Constructive neural
networks are a collection of a group of methods, which enable the network architecture to be constructed along
with the training process. Although focusing on constructive algorithms that construct feedforward architecture
for regression problems. In general, a constructive algorithm has two integral components: pre-specified
network growing strategy and local optimization technique for updating weights during learning. The role of
adaptive sigmoidal activation function is justified in constructive neural networks for better generalization
performance and lesser training time.
8. References
[1]

Cybenko, G. (1989): Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signal and Systems 2:. 303314.
[2] Hornik, M. S. K; White, H. (1989): Multilayer feedforward networks are universal approximators. Neural Networks, vol. 2(5), 359366.
[3] Koza, J. R.; Rice, J. P. (1991): Genetic generation of both the weights and architecture for a neural network. in Proc. IEEE, IJCNN,
Seattle, WA, 1991, vol. 2, 397-404.
[4] Yao, X.; Liu, Y. (1997): A new evolutionary system for evolving artificial neural networks. Transactions on Neural Networks, vol. 8,
no. 3, 694-713.
[5] Wei, G. (2008): Evolutionary Neural Network Based on New Ant Colony Algorithm. International symposium on Computational
Intelligence and Design IEEE, 318-321.
[6] Huang, R.; Tong, S. (2009): Evolving Product Unit Neural Networks with Particle Swarm Optimization. Fifth International
Conference on Image and Graphics, IEEE Computer Society.
[7] Kwok, T. Y.; Yeung, D. Y. (1997a): Constructive Algorithms for Structure Learning in feedforward Neural Networks for Regression
Problems. IEEE Transactions on Neural Networks, 8 (3), 630-645.
[8] Reed, R. (1993): Pruning algorithms-A Survey. IEEE Transactions on Neural Networks, vol. 4, 740-747.
[9] Parekh, R.; Yang, J.; Honavar, V. (2000): Constructive neural-network learning algorithms for pattern classification. IEEE Transaction
on Neural Networks, vol. 11, no. 2, pp. 436-451.
[10] Fahlman, S. E.; Lebiere, C. (1990): The cascade correlation learning architecture. Advances in Neural Information Processing System
2, D. S. Touretzky, Ed. CA: Morgan Kaufmann, 524-277.
[11] Ash, T. (1989): Dynamic node creation in backpropagation networks. Connection Science, vol. 1, no. 4, 365-375.

[12] Friedman, J. H.; Stuetzle, W. (1981): Projection pursuit regression. J. Amer. Statist. Assoc., vol. 76, no. 376, pp. 817823.
[13] Platt, J. (1991): A resource-allocating network for function interpolation. Neural Computation, vol. 3, pp. 213-225.
[14] Farlow, S. J. Eds. (1984): Self-Organizing Methods in Modeling: GMDH Type Algorithms, vol. 54 of Statistics: Textbooks and
Monographs. New York: Marcel Dekker.
[15] Nabhan, T. M.; Zomaya A. Y. (1994): Toward generating neural network structures for function approximation. Neural Networks, vol.
7, no. 1, pp. 89-90.
[16] Gallant, S. I. (1986): Three constructive algorithms for network learning. in IEEE proc. 8th conf. on Pattern Recognition, pages 849852.
[17] Mezard, M.; Nadal, J. P. (1989): Learning in feedforward layered networks: The Tiling algorithm. Journal of Physics A: Math. Gen.,
vol. 22, no. 12, pp. 2191- 2203.
[18] Frean, M (1990): The Upstart algorithm: A method for constructing and training feed-forward neural networks. Neural Networks, vol.
2, pp. 198-209.
[19] Burgess, N. (1994): A constructive algorithm that converges for real-valued input patterns. Int. journal of Neural Systems 5(1), pp. 5966.
[20] Parekh, R.; Yang, J.; Honavar, V. (1997): Constructive Neural network Learning Algorithms for multi-category pattern classification.
in Artificial Intelligence Research Group, Department of Computer Science, 26 Atanasoff Hall, Iowa Sate University, Ames, Iowa ,
USA.
[21] Nicoletti, M. C.; Bertini, J. R (2007): An empirical evaluation of constructive neural network algorithms in classification tasks. in Int.
J. Innovative Computing and Applications, vol. 1, no.1
[22] Nicoletti, M. C. et al. (2009): Constructive Neural Network Algorithm for feedforward Architectures Suitable for Classification Tasks.
pp. 1-23, In Leonardo Franco etc. (Eds.): Constructive Neural Networks (Studies in Computational Intelligence vol. 258), Springer.
[23] Marchand, M. et al., (1990): A convergence theorem for sequential learning in two layer perceptrons. Europhysics Letters 11(6), pp.
487-492.
[24] Young, S.; Downs, T. (1998): CARVE- A constructive algorithm for real valued examples. in IEEE Transactions on Neural
Networks, vol. 9, no. 6.

ISSN: 0975-5462

7854

Sudhir Kumar Sharma et al. / International Journal of Engineering Science and Technology
Vol. 2(12), 2010, 7847-7855
[25] Campbell, C.; Vicente, C. P. (1995): Constructing feed-forward neural networks for binary classification tasks. Advanced Computing
Research Centre, Bristol University, United Kingdom.
[26] Mascioli, F. M. F.; Martinelli, G. (1995): A Constructive algorithm for binary neural networks: the oil-spot algorithm. IEEE
Transaction on Neural Networks, 6, pp. 794-797.
[27] Draghici, S. (2001): The Constraint based decomposition (CBD) training architecture. Neural Networks, 14, pp. 527-550.
[28] Subirats, J. L.; Jerez, J. M. ; Franco L. (2008): A new decomposition algorithm for threshold synthesis and generalization of Boolean
functions. IEEE Transaction on Circuits and Systems I 55, pp. 3188-3196.
[29] Setiono, R.; Hui, L. C. K (1995): Use of a Quasi-Newton Method in a Feedforward Neural Network Construction Algorithm. IEEE
Transactions on Neural Networks, vol. 6, no. 1.
[30] Kwok, T. Y.; Yenug D. Y. (1997b): Objective functions for training new hidden units in constructive neural networks. IEEE
Transactions on Neural Networks, vol. 8, no. 5, 1131-1148
[31] Prechelt, P. (1997): Investigation of the cascor family of learning algorithms. Neural Networks 10 (5), pp 885-896.
[32] Lahnajarvi, J. J. T.; Lehtokangas, M. I.; Saarinen, J. P. P. (1999): Fixed Cascade Error-A Novel Constructive Neural network for
Structure Learning. Proceedings of the Artificial Neural Networks in Engineering Conference, ANNIE99, St. Louis, Missouri, USA.
[33] Lehtokangas, M. (1999): Modeling with constructive backpropagation. Neural Networks, vol. 12, 707-716.
[34] Treadgold, N. K.; Gedeon, T. D. (1997): A Cascade Network employing Progressive RPROP. International conference on Artificial
and Natural Neural Networks, pp. 733-742.
[35] Ma, L.; Khorasani, K. (2003): A new strategy for adaptively constructing multiplayer feedforward neural networks. Neurocomputing
51, pp 361-385.
[36] Islam, M. M. et al. (2009): A New Adaptive Merging and Growing Algorithm for Designing Artificial Neural Networks. IEEE
Transactions on Systems, Man, and Cybernetics- Part B: Cybernetics, vol. 39, no. 3.
[37] Rumelhart, D. E.; Hinton, G. E.; Williams R. J. (1986): Learning internal representations by error propagation. Parallel Distributed
Processing, vol. I, D. E. Rumelhart and J. L. McClelland, Eds. Cambridge, MA: MIT Press, 318-362.
[38] Fahlman, S. E. (1989): An empirical study of learning speed in backpropagation networks. Carnegie Mellon Univ., Pittsburg, PA,
Tech. Rep. CMU-CS-88-162.
[39] Riedmiller, M.; Braun, H. (1993): A direct adaptive method for faster backpropagation learning: The RPROP Algorithm. Proc. of the
IEEE Int. Conf. on Neural Networks, San Francisco, CA, 586-591
[40] Setiono, R.; Hui L. C. K. (1995): Use of a Quasi-Newton Method in a Feedforward Neural Network Construction Algorithm. IEEE
Transactions on Neural Networks, vol. 6, no. 1
[41] Hagan, M. T.; Menhaj, M. B. (1994): Training Feedforward Networks with the Marquardt algorithm. IEEE Transactions on Neural
Networks, vol. 5, no. 6, 989- 993
[42] Nechyba, M. C.; Xu, Y. (1994): Neural network approach to control system identification with variable activation functions. IEEE
International Symposium on Intelligent Control, Columbus, Ohio, USA.
[43] Hwang, J. N.; Shien, S.; Lay, S. R. (1996): The Cascade Correlation Learning: A Projection Pursuit Learning Perspective. IEEE
Transactions on Neural Networks, vol. 7, no. 2.
[44] Lahnajarvi, J. J. T.; Lehtokangas, M. I.; Saarinen, J. P. P. (2002): Evaluation of constructive neural networks with cascaded
architectures. Neurocomputing, vol. 48, pp. 573-607.
[45] Thivierge, J. -P.; Rivest F.; Shultz, T. R. (2003): A dual-phase technique for pruning constructive networks. In proceedings of the
IEEE International Joint Conference on Neural Networks, vol. 1, pp. 559-564.
[46] Islam, M.; Murase, K (2000): A new algorithm to design compact two-hidden-layer artificial neural networks,Neural Networks, vol.
14, no. 9, pp. 1265-1278.
[47] Wang, D. (2008): Fast constructive-covering algorithm for neural networks and its implementation in classification. In Applied
SoftComputing 8, pp. 166-173.
[48] Sharma, S. K.; Chandra, P. (2010a): An adaptive slope sigmoidal function cascading neural networks algorithm. Proc. of the IEEE,
Third International Conference on Emerging Trends in Engineering and Technology (ICETET 2010), India, pp. 139-144, doi:
10.1109/ICETET.2010.71.
[49] Sharma, S. K.; Chandra. P. (2010b): An adaptive slope basic dynamic node creation algorithm for single hidden layer neural networks.
Proc. of the IEEE, International Conference on Computational Intelligence and Communication Systems (CICN 2010), India, pp. 531539, doi: 10.1109/CICN.2010.38.

ISSN: 0975-5462

7855

You might also like