Artificial Neural Network
Artificial Neural Network
Artificial Neural Network
+ =
j
i j ij i
x w net
where
i
net describes the result of the net inputs
i
x (weighted by the weights
ij
w )
impacting on unit i. Also,
ij
w are weights connecting neuron j to neuron i,
j
x
is output
from unit j and
i
is a threshold for neuron i. Threshold term is baseline input to a node in
absence of any other inputs. If a weight
ij
w is negative, it is termed inhibitory because it
decreases net input, otherwise it is called excitatory.
Each unit takes its net input and applies an activation function to it. For example, output
of j
th
unit, also called activation value of the unit, is ) x w ( g
i ji
, where g(.) is activation
function and
i
x is output of i
th
unit connected to unit j. A number of nonlinear functions
have been used in the literature as activation functions. The threshold function is useful in
situations where the inputs and outputs are binary encoded. However, most common
choice is sigmoid functions, such as
[ ]
1
netinput
e 1 ) netinput ( g
+ =
or
) netinput tanh( ) netinput ( g =
The activation function exhibits a great variety, and has the biggest impact on behaviour
and performance of the ANN. The main task of the activation function is to map the
outlying values of the obtained neural input back to a bounded interval such as [0, 1] or [
1, 1]. The sigmoid function has some advantages, due to its differentiability within the
Artificial Neural Networks
4
context of finding a steepest descent gradient for the backpropagation method and
moreover maps a wide domain of values into the interval [0, 1].
The various steps in developing a neural network forecasting model are:
3.1 Variable Selection
The input variables important for modeling/ forecasting variable(s) under study are
selected by suitable variable selection procedures.
3.2 Formation of Training, Testing and Validation Sets
The data set is divided into three distinct sets called training, testing and validation sets.
The training set is the largest set and is used by neural network to learn patterns present in
the data. The testing set is used to evaluate the generalization ability of a supposedly
trained network. A final check on the performance of the trained network is made using
validation set.
3.3 Neural Network Architecture
Neural network architecture defines its structure including number of hidden layers,
number of hidden nodes and number of output nodes etc.
(i) Number of hidden layers: The hidden layer(s) provide the network with its ability to
generalize. In theory, a neural network with one hidden layer with a sufficient
number of hidden neurons is capable of approximating any continuous function. In
practice, neural network with one and occasionally two hidden layers are widely
used and have to perform very well.
(ii) Number of hidden nodes: There is no magic formula for selecting the optimum
number of hidden neurons. However, some thumb rules are available for calculating
number of hidden neurons. A rough approximation can be obtained by the geometric
pyramid rule proposed by Masters (1993). For a three layer network with n input and
m output neurons, the hidden layer would have sqrt(n*m) neurons.
(iii) Number of output nodes: Neural networks with multiple outputs, especially if these
outputs are widely spaced, will produce inferior results as compared to a network
with a single output.
(iv) Activation function: Activation functions are mathematical formulae that determine
the output of a processing node. Each unit takes its net input and applies an
activation function to it. Non linear functions have been used as activation functions
such as logistic, tanh etc. The purpose of the transfer function is to prevent output
from reaching very large value which can paralyze neural networks and thereby
inhibit training. Transfer functions such as sigmoid are commonly used because they
are nonlinear and continuously differentiable which are desirable for network
learning.
3.4 Model Building
Multilayer feed forward neural network or multi layer perceptron (MLP), is very popular
and is used more than other neural network type for a wide variety of tasks. Multilayer
feed forward neural network learned by back propagation algorithm is based on
supervised procedure, i.e., the network constructs a model based on examples of data with
known output. It has to build the model up solely from the examples presented, which are
together assumed to implicitly contain the information necessary to establish the relation.
Artificial Neural Networks
5
An MLP is a powerful system, often capable of modeling complex, relationships between
variables. It allows prediction of an output object for a given input object. The architecture
of MLP is a layered feedforward neural network in which the non-linear elements
(neurons) are arranged in successive layers, and the information flow uni-directionally
from input layer to output layer through hidden layer(s). The characteristics of Multilayer
Perceptron are as follows:
(i) has any number of inputs
(ii) has one or more hidden layers with any number of nodes. The internal layers are
called hidden because they only receive internal input (input from other
processing units) and produce internal output (output to other processing units).
Consequently, they are hidden from the output world.
(iii) uses linear combination function in the hidden and output layers
(iv) uses generally sigmoid activation function in the hidden layers
(v) has any number of outputs with any activation function.
(vi) has connections between the input layer and the first hidden layer, between the
hidden layers, and between the last hidden layer and the output layer.
An MLP with just one hidden layer can learn to approximate virtually any function to any
degree of accuracy. For this reason MLPs are known as universal approximates and can be
used when we have litter prior knowledge of the relationship between input and targets.
One hidden layer is always sufficient provided we have enough data. Schematic
representation of neural network is given in Fig. 1 and mathematical representation of
neural network is given in Fig. 2.
Fig. 4.1: Schematic representation of neural network
Fig. 4.2: Mathematical representation of neural network
I
n
p
u
t
s
O
u
t
p
u
t
Artificial Neural Networks
6
Each interconnection in an ANN has a strength that is expressed by a number referred to
as weight. This is accomplished by adjusting the weights of given interconnection
according to some learning algorithm. Learning methods in neural networks can be
broadly classified into three basic types (i) supervised learning (ii) unsupervised learning
and (iii) reinforced learning. In MLP, the supervised learning will be used for adjusting
the weights. The graphic representation of this learning is given in Fig. 3.
4. Architecture of Neural Networks
There are several types of architecture of ANN. However, the two most widely used ANN
are discussed below:
Feedforward Networks
Feedforward ANNs allow signals to travel one way only; from input to output. There is no
feedback (loops) i.e. the output of any layer does not affect that same layer. They are
extensively used in pattern recognition.
Feedback/Recurrent Networks
Feedback networks can have signals traveling in both directions by introducing loops in
the network. Feedback networks are dynamic; their 'state' is changing continuously until
they reach an equilibrium point. They remain at the equilibrium point until the input
changes and a new equilibrium needs to be found.
5. Supervised and Unsupervised Learning
Learning law describes the weight vector for the i
th
processing unit at time instant (t+1) in
terms of the weight vector at time instant (t) as follows:
) t ( w ) t ( w ) 1 t ( w
i i i
+ = + ,
where ) t ( w
i
is the change in the weight vector.
The network adapts as follows: change the weight by an amount proportional to the
difference between the desired output and the actual output. As an equation:
W
i
= * (D-Y).I
i
I
n
p
u
t
v
e
c
t
o
r
O
u
t
p
u
t
v
e
c
t
o
r
T
a
r
g
e
t
v
e
c
t
o
r
D
i
f
f
e
r
e
n
c
e
s
Adjust weights
ANN
Fig. 4.3: A learning cycle in the ANN model
=
Artificial Neural Networks
7
where is the learning rate, D is the desired output, Y is the actual output, and I
i
is the i
th
input. This is called the Perceptron Learning Rule. The weights in an ANN, similar to
coefficients in a regression model, are adjusted to solve the problem presented to ANN.
Learning or training is term used to describe process of finding values of these weights.
Two types of learning with ANN are supervised and unsupervised learning.
Supervised learning which incorporates an external teacher, so that each output unit is told
what its desired response to input signals ought to be. During the learning process global
information may be required. An important issue concerning supervised learning is the
problem of error convergence, i.e. the minimization of error between the desired and
computed unit values. The aim is to determine a set of weights which minimizes the error.
Unsupervised learning uses no external teacher and is based upon only local information.
We say that a neural network learns off-line if the learning phase and the operation phase
are distinct. A neural network learns on-line if it learns and operates at the same time.
Usually, supervised learning is performed off-line, whereas unsupervised learning is
performed on-line.
6. Further Concepts on Hidden Layers
There are really two decisions that must be made with regards to the hidden layers. The
first is how many hidden layers to actually have in the neural network. Secondly, you
must determine how many neurons will be in each of these layers. We will first examine
how to determine the number of hidden layers to use with the neural network. Differences
between the numbers of hidden layers are summarized in Table 6.1.
Table 6.1: Determining the number of hidden layers
Number of Hidden
Layers
Result
None Only capable of representing linear separable functions or
decisions.
1 Can approximate arbitrarily while any functions which
contains a continuous mapping from one finite space to
another
2 Represent an arbitrary decision boundary to arbitrary accuracy
with rational activation functions and can approximate any
smooth mapping to any accuracy.
Just deciding the number of hidden neuron layers is only a small part of the problem. You
must also determine how many neurons will be in each of these hidden layers. This
process is covered in the next section
There are many rule-of-thumb methods for determining the correct number of neurons to
use in the hidden layers. Some of them are summarized as follows:
The number of hidden neurons should be in the range between the size of the input
layer and the size of the output layer.
Artificial Neural Networks
8
The number of hidden neurons should be 2/3 of the input layer size, plus the size of
the output layer.
The number of hidden neurons should be less than twice the input layer size.
These three rules are only starting points. Ultimately the selection of the architecture of
neural network will come down to trial and error. One approach is to split the data set
about 70%, 20% and 10%, using the first set for training and the second for validation and
to end learning if the reported value of the error function increases again after a longer
phase of reduction in the first part of the learning.
7. Backpropagation Algorithm
The MLP network is trained using one of the supervised learning algorithms of which the
best known example is backpropagation, which uses the data to adjust the network's
weights and thresholds so as to minimize the error in its predictions on the training set.
We denote by
ij
W the weight of the connection from unit
i
u to unit
j
u . It is then
convenient to represent the pattern of connectivity in the network by a weight matrix W
whose elements are the weights
ij
W . The pattern of connectivity characterizes the
architecture of the network. A unit in the output layer determines its activity by following
a two step procedure.
First, it computes the total weighted input
j
x , using the formula:
ij
i
i j
W y X
=
where
i
y is the activity level of the j
th
unit in the previous layer and
ij
W is the weight
of the connection between the i
th
and the j
th
unit.
Next, the unit calculates the activity
j
y using some function of the total weighted
input.
Typically we use the sigmoid function:
[ ]
1
x
j
j
e 1 y
+ =
Once the activities of all output units have been determined, the network computes the
error E, which is defined by the expression:
( )
2
i
i i
d y
2
1
E
=
where
j
y is the activity level of the j
th
unit in the top layer and
j
d is the desired output of
the j
th
unit.
The back propagation algorithm consists of four steps:
(i) Compute how fast the error changes as the activity of an output unit is changed. This
Artificial Neural Networks
9
error derivative (EA) is the difference between the actual and the desired activity.
j j
j
j
d y
y
E
EA =
=
(ii) Compute how fast the error changes as the total input received by an output unit is
changed. This quantity (EI) is the answer from step (i) multiplied by the rate at
which the output of a unit changes as its total input is changed.
( )
j j j
j
j
j j
j
y 1 y EA
dx
dy
y
E
X
E
EI =
=
(iii) Compute how fast the error changes as a weight on the connection into an output
unit is changed. This quantity (EW) is the answer from step (ii) multiplied by the
activity level of the unit from which the connection emanates.
i j
ij
j
j ij
ij
y EI
W
X
X
E
W
E
EW =
=
(iv) Compute how fast the error changes as the activity of a unit in the previous layer is
changed. This crucial step allows back propagation to be applied to multilayer
networks. When the activity of a unit in the previous layer changes, it affects the
activities of all the output units to which it is connected. So to compute the overall
effect on the error, we add together all these separate effects on output units. But
each effect is simple to calculate. It is the answer in step (iii) multiplied by the
weight on the connection to that output unit.
ij
j
j
i
j
j j i
i
W EI
y
x
x
E
y
E
EA
=
=
By using steps (ii) and (iv), we can convert the EAs of one layer of units into EAs for the
previous layer. This procedure can be repeated to get the EAs for as many previous layers
as desired. Once we know the EA of a unit, we can use steps (ii) and (iii) to compute the
EWs on its incoming connections.
References and Suggested Reading
Anderson, J. A. (2003). An Introduction to neural networks. Prentice Hall.
Cheng, B. and Titterington, D. M. (1994). Neural networks: A review from a statistical
perspective. Statistical Science, 9, 2-54.
Dewolf, E.D., and Francl, L.J., (1997). Neural network that distinguish in period of wheat
tan spot in an outdoor environment. Phytopathalogy, 87(1) pp 83-87.
Dewolf, E.D. and Francl, L.J. (2000) Neural network classification of tan spot and
stagonespore blotch infection period in wheat field environment. Phytopathalogy,
20(2), 108-113 .
Gaudart, J. Giusiano, B. and Huiart, L. (2004). Comparison of the performance of multi-
layer perceptron and linear regression for epidemiological data. Comput. Statist. &
Data Anal., 44, 547-70.
Hassoun, M. H. (1995). Fundamentals of Artificial Neural Networks. Cambridge: MIT
Press.
Artificial Neural Networks
10
Hebb,D.O. (1949) The organization of behaviour: A Neuropsychological Theory, Wiley,
New York
Hopfield, J.J. (1982). Neural network and physical system with emergent collective
computational capabilities. In proceeding of the National Academy of
Science(USA) 79, pp 2554-2558.
Kaastra, I. and Boyd, M.(1996): Designing a neural network for forecasting financial and
economic time series. Neurocomputing, 10(3), pp 215-236 (1996)
Kumar, M., Raghuwanshi, N. S., Singh, R,. Wallender, W. W. and Pruitt, W. O. (2002).
Estimating Evapotranspiration using Artificial Neural Network. Journal of
Irrigation and Drainage Engineering, Vol. 128(4), pp. 224-233
Madhav, Kandala Venu (2003). Study of statistical modeling techniques in agricultural.
Ph.D. thesis thesis, IARI.
Masters, T. (1993). Practical neural network recipes in C++, Academic press, NewYork.
Mcculloch, W.S. and Pitts, W. (1943) A logical calculus of the ideas immanent in nervous
activity. Bull. Math. Biophy., 5, pp 115-133
Pal, S. Das, J. Sengupta, P. and Banerjee, S. K. (2002). Short term prediction of
atmospheric temperature using neural networks. Mausam, 53, 471-80
Patterson, D. (1996). Artificial Neural Networks. Singapore: Prentice Hall.
Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage ang
organization in the brain. Psychological review, 65, pp 386-408.
Rumelhart, D.E., Hinton, G.E and Williams, R.J. (1986). Learning internal representation
by error propagation, in Parallel distributed processing: Exploration in
microstructure of cognition, Vol. (1) ( D.E. Rumelhart, J.L. McClelland and the
PDP research gropus, edn.) Cambridge, MA: MIT Press, pp 318-362.
Saanzogni, Louis and Kerr, Don (2001) Milk production estimate using feed forward
artificial neural networks. Computer and Electronics in Agriculture, 32, pp 21-30.
Schalkoff, R. J. (1997). Artificial neural networks. The Mc Graw-Hall
Warner, B. and Misra, M. (1996). Understanding neural networks as statistical tools.
American Statistician, 50, 284-93.
Widrow, B. and Hoff, M.E. (1960). Adapative switching circuit. IREWESCON convention
record, 4, pp 96-104
Yegnanarayana, B. (1999). Artificial Neural Networks. Prentice Hall
Zhang, G., Patuwo, B. E. and Hu, M. Y. (1998). Forecasting with artificial neural
networks: The state of the art. International Journal of Forecasting, 14, 35-62.