Artificial Neural Network

ARTIFICIAL NEURAL NETWORKS
GIRISH KUMAR JHA

Indian Agricultural Research Institute
PUSA, New Delhi-110 012
girish_iasri@rediffmail.com

1. Introduction
Artificial Neural Networks (ANNs) are non-linear mapping structures based on the
function of the human brain. They are powerful tools for modelling, especially when the
underlying data relationship is unknown. ANNs can identify and learn correlated patterns
between input data sets and corresponding target values. After training, ANNs can be used
to predict the outcome of new independent input data. ANNs imitate the learning process
of the human brain and can process problems involving non-linear and complex data even
if the data are imprecise and noisy. Thus they are ideally suited for the modeling of
agricultural data which are known to be complex and often non-linear. ANNs has great
capacity in predictive modeling i.e., all the characters describing the unknown situation
can be presented to the trained ANNs, and then prediction of agricultural systems is
guaranteed.

An ANN is a computational structure that is inspired by observed process in natural
networks of biological neurons in the brain. It consists of simple computational units
called neurons, which are highly interconnected. ANNs have become the focus of much
attention, largely because of their wide range of applicability and the ease with which they
can treat complicated problems. ANNs are parallel computational models comprised of
densely interconnected adaptive processing units. These networks are fine-grained parallel
implementations of nonlinear static or dynamic systems. A very important feature of these
networks is their adaptive nature, where learning by example replaces programming
in solving problems. This feature makes such computational models very appealing in
application domains where one has little or incomplete understanding of the problem to be
solved but where training data is readily available. ANNs are now being increasingly
recognized in the area of classification and prediction, where regression model and other
related statistical techniques have traditionally been employed. The most widely used
learning algorithm in an ANN is the Backpropagation algorithm. There are various types
of ANNs like Multilayered Perceptron, Radial Basis Function and Kohonen networks.
These networks are neural in the sense that they may have been inspired by
neuroscience but not necessarily because they are faithful models of biological neural or
cognitive phenomena. In fact majority of the network are more closely related to
traditional mathematical and/or statistical models such as non-parametric pattern
classifiers, clustering algorithms, nonlinear filters, and statistical regression models than
they are to neurobiology models.

ANNs have been used for a wide variety of applications where statistical methods are
traditionally employed. The problems which were normally solved through classical
statistical methods, such as discriminant analysis, logistic regression, Bayes analysis,
multiple regression, and ARIMA time-series models are being tackled by ANNs. It is,
therefore, time to recognize ANN as a powerful tool for data analysis.

Artificial Neural Networks
2

2. Review
Mc-Culloch Pitts (1943) proposed a model of computing elements called Mc-Culloch
Pitts neurons, which performs weighted sum of the inputs to these elements followed by
a threshold logic operation. Combinations of these computing elements were used to
realize several logical computations. The main drawback of this model of computation is
that the weights are fixed and hence the models could not learn from examples which is
the main characteristic of the ANN technique which later evolved. Hebb (1949), proposed
a learning scheme for adjusting a connection weight based on pre and post synaptic values
of the variables. Hebbs law became a fundamental learning rule in neuron network
literature. Rosenblatt (1958), proposed the perceptron models, which has weight
adjustable by the perceptron learning law. Widrows and Hoff (1960) and his group
proposed an ADALINE (Adaptive Linear Element) model for computing elements and
LMS (Least Mean Square) learning algorithm to adjust the weights of an ADALINE
model. Hopfield (1982), gave energy analysis of feed back neural networks. The analysis
has shown the existence of stable equilibrium states in a feed back network, provided the
network has symmetrical weights. Rumelhart et al. (1986), showed that it is possible to
adjust the weight of a multilayer feed forward neural network in a systematic way to learn
the implicit mapping in a set of input output patterns pairs. The learning law is called
generalized delta rule or error back propagation. An excellent overview of various aspects
of ANN is provided by Cheng and Titterington (1994) and Warner and Misra (1996).
Kaastra and Boyd (1996) developed neural network model for forecasting financial and
economic time series. Dewolf et al. (1997, 2000) demonstrated the applicability of neural
network technology for plant diseases forecasting
Zhang et al. (1998) provided the general summary of the work in ANN forecasting,
providing the guidelines for neural network modeling, general paradigm of the ANNs
especially those used for forecasting, modeling issue of ANNs in forecasting and relative
performance of ANN over traditional statistical methods. Sanzogni et al. (2001)
developed the models for predicting milk production from farm inputs using standard feed
forward ANN. Gaudart et al. (2004) compared the performance of MLP and that of linear
regression for epidemiological data with regard to quality of prediction and robustness to
deviation from underlying assumptions of normality, homoscedasticity and independence
of errors. More general books on neural networks and related topics contain separate
chapters/sections on neural networks, to cite a few, H Ha as ss so ou un n ( (1 19 99 95 5) ), , P Pa at tt te er rs so on n ( (1 19 99 96 6) ), ,
Schalkoff (1997), Yegnanarayana (1999), A An nd de er rs so on n ( (2 20 00 03 3) ), , e et tc c. . Software on neural
networks has also been made and are as follows:

Commercial Software:- Statistica Neural Network, TNs2Server,DataEngine, Know Man
Basic Suite, Partek, Saxon, ECANSE - Environment for Computer Aided Neural Software
Engineering, Neuroshell, Neurogen, Matlab:Neural Network Toolbar, Tarjan, FCM(Fuzzy
Control manager).

Freeware Software:- NetII, Spider Nets Neural Network Library, NeuDC, Binary Hopfeild
Net with free Java source, Neural shell, PlaNet, Valentino Computational Neuroscience
Work bench, Neural Simulation language version-NSL, Brain neural network Simulator.

In India, few studies have been conducted using ANN models. Kumar et. al. (2002)
studied utility of artificial neural networks (ANNs) for estimation of daily grass reference
3
crop evapotranspiration and compared the performance of ANNs with the conventional
method used to estimate evapotranspiration. Pal et al. (2002) developed MLP based
forecasting model for maximum and minimum temperatures for ground level at Dum
Dum station, Kolkata on the basis of daily data on several variables, such as mean sea
level pressure, vapour pressure, relative humidity, rainfall, and radiation for the period
1989-95. Madhav (2003) forecasted wheat productivity of Junagadh (Gujarat) using data
upon weather parameters based on ANN model.
3. Development of an ANN Model
Development of ANN model is discussed here briefly. ANNs are constructed with layers
of units, and thus are termed multilayer ANNs. A layer of units in such an ANN is
composed of units that perform similar tasks. First layer of a multilayer ANN consists of
input units. These units are known as independent variables in statistical literature. Last
layer contains output units. In statistical nomenclature, these units are known as dependent
or response variables. All other units in the model are called hidden units and constitute
hidden layers. There are two functions governing the behaviour of a unit in a particular
layer, which normally are the same for all units within the whole ANN, i.e.

the input function, and
the output/activation function.

Input into a node is a weighted sum of outputs from nodes connected to it. The input
function is normally given by equation (1) as follows:

+ =
j
i j ij i
x w net
where
i
net describes the result of the net inputs
i
x (weighted by the weights
ij
w )
impacting on unit i. Also,
ij
w are weights connecting neuron j to neuron i,
j
x

is output
from unit j and
i

is a threshold for neuron i. Threshold term is baseline input to a node in
absence of any other inputs. If a weight
ij
w is negative, it is termed inhibitory because it
decreases net input, otherwise it is called excitatory.

Each unit takes its net input and applies an activation function to it. For example, output
of j
th
unit, also called activation value of the unit, is ) x w ( g
i ji
, where g(.) is activation
function and
i
x is output of i
th
unit connected to unit j. A number of nonlinear functions
have been used in the literature as activation functions. The threshold function is useful in
situations where the inputs and outputs are binary encoded. However, most common
choice is sigmoid functions, such as
[ ]
1
netinput
e 1 ) netinput ( g

+ =
or
) netinput tanh( ) netinput ( g =

The activation function exhibits a great variety, and has the biggest impact on behaviour
and performance of the ANN. The main task of the activation function is to map the
outlying values of the obtained neural input back to a bounded interval such as [0, 1] or [
1, 1]. The sigmoid function has some advantages, due to its differentiability within the
4
context of finding a steepest descent gradient for the backpropagation method and
moreover maps a wide domain of values into the interval [0, 1].

The various steps in developing a neural network forecasting model are:

3.1 Variable Selection
The input variables important for modeling/ forecasting variable(s) under study are
selected by suitable variable selection procedures.

3.2 Formation of Training, Testing and Validation Sets
The data set is divided into three distinct sets called training, testing and validation sets.
The training set is the largest set and is used by neural network to learn patterns present in
the data. The testing set is used to evaluate the generalization ability of a supposedly
trained network. A final check on the performance of the trained network is made using
validation set.

3.3 Neural Network Architecture
Neural network architecture defines its structure including number of hidden layers,
number of hidden nodes and number of output nodes etc.

(i) Number of hidden layers: The hidden layer(s) provide the network with its ability to
generalize. In theory, a neural network with one hidden layer with a sufficient
number of hidden neurons is capable of approximating any continuous function. In
practice, neural network with one and occasionally two hidden layers are widely
used and have to perform very well.
(ii) Number of hidden nodes: There is no magic formula for selecting the optimum
number of hidden neurons. However, some thumb rules are available for calculating
number of hidden neurons. A rough approximation can be obtained by the geometric
pyramid rule proposed by Masters (1993). For a three layer network with n input and
m output neurons, the hidden layer would have sqrt(n*m) neurons.
(iii) Number of output nodes: Neural networks with multiple outputs, especially if these
outputs are widely spaced, will produce inferior results as compared to a network
with a single output.
(iv) Activation function: Activation functions are mathematical formulae that determine
the output of a processing node. Each unit takes its net input and applies an
activation function to it. Non linear functions have been used as activation functions
such as logistic, tanh etc. The purpose of the transfer function is to prevent output
from reaching very large value which can paralyze neural networks and thereby
inhibit training. Transfer functions such as sigmoid are commonly used because they
are nonlinear and continuously differentiable which are desirable for network
learning.

3.4 Model Building
Multilayer feed forward neural network or multi layer perceptron (MLP), is very popular
and is used more than other neural network type for a wide variety of tasks. Multilayer
feed forward neural network learned by back propagation algorithm is based on
supervised procedure, i.e., the network constructs a model based on examples of data with
known output. It has to build the model up solely from the examples presented, which are
together assumed to implicitly contain the information necessary to establish the relation.
5
An MLP is a powerful system, often capable of modeling complex, relationships between
variables. It allows prediction of an output object for a given input object. The architecture
of MLP is a layered feedforward neural network in which the non-linear elements
(neurons) are arranged in successive layers, and the information flow uni-directionally
from input layer to output layer through hidden layer(s). The characteristics of Multilayer
Perceptron are as follows:
(i) has any number of inputs
(ii) has one or more hidden layers with any number of nodes. The internal layers are
called hidden because they only receive internal input (input from other
processing units) and produce internal output (output to other processing units).
Consequently, they are hidden from the output world.
(iii) uses linear combination function in the hidden and output layers
(iv) uses generally sigmoid activation function in the hidden layers
(v) has any number of outputs with any activation function.
(vi) has connections between the input layer and the first hidden layer, between the
hidden layers, and between the last hidden layer and the output layer.
An MLP with just one hidden layer can learn to approximate virtually any function to any
degree of accuracy. For this reason MLPs are known as universal approximates and can be
used when we have litter prior knowledge of the relationship between input and targets.
One hidden layer is always sufficient provided we have enough data. Schematic
representation of neural network is given in Fig. 1 and mathematical representation of
neural network is given in Fig. 2.

Fig. 4.1: Schematic representation of neural network

Fig. 4.2: Mathematical representation of neural network
I
n
p
u
t
s

O
u
t
p
u
t
6
Each interconnection in an ANN has a strength that is expressed by a number referred to
as weight. This is accomplished by adjusting the weights of given interconnection
according to some learning algorithm. Learning methods in neural networks can be
broadly classified into three basic types (i) supervised learning (ii) unsupervised learning
and (iii) reinforced learning. In MLP, the supervised learning will be used for adjusting
the weights. The graphic representation of this learning is given in Fig. 3.

4. Architecture of Neural Networks
There are several types of architecture of ANN. However, the two most widely used ANN
are discussed below:

Feedforward Networks
Feedforward ANNs allow signals to travel one way only; from input to output. There is no
feedback (loops) i.e. the output of any layer does not affect that same layer. They are
extensively used in pattern recognition.

Feedback/Recurrent Networks
Feedback networks can have signals traveling in both directions by introducing loops in
the network. Feedback networks are dynamic; their 'state' is changing continuously until
they reach an equilibrium point. They remain at the equilibrium point until the input
changes and a new equilibrium needs to be found.

5. Supervised and Unsupervised Learning
Learning law describes the weight vector for the i
th
processing unit at time instant (t+1) in
terms of the weight vector at time instant (t) as follows:
) t ( w ) t ( w ) 1 t ( w
i i i
+ = + ,
where ) t ( w
i
is the change in the weight vector.

The network adapts as follows: change the weight by an amount proportional to the
difference between the desired output and the actual output. As an equation:
W
i
= * (D-Y).I
i

I
n
p
u
t

v
e
c
t
o
r

O
u
t
p
u
t

v
e
c
t
o
r

T
a
r
g
e
t

v
e
c
t
o
r

D
i
f
f
e
r
e
n
c
e
s

Adjust weights
ANN

Fig. 4.3: A learning cycle in the ANN model

=
7
where is the learning rate, D is the desired output, Y is the actual output, and I
i
is the i
th

input. This is called the Perceptron Learning Rule. The weights in an ANN, similar to
coefficients in a regression model, are adjusted to solve the problem presented to ANN.
Learning or training is term used to describe process of finding values of these weights.
Two types of learning with ANN are supervised and unsupervised learning.

Supervised learning which incorporates an external teacher, so that each output unit is told
what its desired response to input signals ought to be. During the learning process global
information may be required. An important issue concerning supervised learning is the
problem of error convergence, i.e. the minimization of error between the desired and
computed unit values. The aim is to determine a set of weights which minimizes the error.

Unsupervised learning uses no external teacher and is based upon only local information.
We say that a neural network learns off-line if the learning phase and the operation phase
are distinct. A neural network learns on-line if it learns and operates at the same time.
Usually, supervised learning is performed off-line, whereas unsupervised learning is
performed on-line.

6. Further Concepts on Hidden Layers
There are really two decisions that must be made with regards to the hidden layers. The
first is how many hidden layers to actually have in the neural network. Secondly, you
must determine how many neurons will be in each of these layers. We will first examine
how to determine the number of hidden layers to use with the neural network. Differences
between the numbers of hidden layers are summarized in Table 6.1.

Table 6.1: Determining the number of hidden layers
Number of Hidden
Layers
Result
None Only capable of representing linear separable functions or
decisions.
1 Can approximate arbitrarily while any functions which
contains a continuous mapping from one finite space to
another
2 Represent an arbitrary decision boundary to arbitrary accuracy
with rational activation functions and can approximate any
smooth mapping to any accuracy.

Just deciding the number of hidden neuron layers is only a small part of the problem. You
must also determine how many neurons will be in each of these hidden layers. This
process is covered in the next section
There are many rule-of-thumb methods for determining the correct number of neurons to
use in the hidden layers. Some of them are summarized as follows:
The number of hidden neurons should be in the range between the size of the input
layer and the size of the output layer.
8
The number of hidden neurons should be 2/3 of the input layer size, plus the size of
the output layer.
The number of hidden neurons should be less than twice the input layer size.
These three rules are only starting points. Ultimately the selection of the architecture of
neural network will come down to trial and error. One approach is to split the data set
about 70%, 20% and 10%, using the first set for training and the second for validation and
to end learning if the reported value of the error function increases again after a longer
phase of reduction in the first part of the learning.

7. Backpropagation Algorithm
The MLP network is trained using one of the supervised learning algorithms of which the
best known example is backpropagation, which uses the data to adjust the network's
weights and thresholds so as to minimize the error in its predictions on the training set.
We denote by
ij
W the weight of the connection from unit
i
u to unit
j
u . It is then
convenient to represent the pattern of connectivity in the network by a weight matrix W
whose elements are the weights
ij
W . The pattern of connectivity characterizes the
architecture of the network. A unit in the output layer determines its activity by following
a two step procedure.

First, it computes the total weighted input
j
x , using the formula:
ij
i
i j
W y X

=
where
i
y is the activity level of the j
th
unit in the previous layer and
ij
W is the weight
of the connection between the i
th
and the j
th
unit.
Next, the unit calculates the activity
j
y using some function of the total weighted
input.

Typically we use the sigmoid function:
[ ]
1
x
j
j
e 1 y

+ =

Once the activities of all output units have been determined, the network computes the
error E, which is defined by the expression:
( )
2
i
i i
d y
2
1
E

=
where
j
y is the activity level of the j
th
unit in the top layer and
j
d is the desired output of
the j
th
unit.

The back propagation algorithm consists of four steps:

(i) Compute how fast the error changes as the activity of an output unit is changed. This
9
error derivative (EA) is the difference between the actual and the desired activity.

j j
j
j
d y
y
E
EA =
=
(ii) Compute how fast the error changes as the total input received by an output unit is
changed. This quantity (EI) is the answer from step (i) multiplied by the rate at
which the output of a unit changes as its total input is changed.
( )
j j j
j
j
j j
j
y 1 y EA
dx
dy
y
E
X
E
EI =
=

(iii) Compute how fast the error changes as a weight on the connection into an output
unit is changed. This quantity (EW) is the answer from step (ii) multiplied by the
activity level of the unit from which the connection emanates.

i j
ij
j
j ij
ij
y EI
W
X
X
E
W
E
EW =
=

(iv) Compute how fast the error changes as the activity of a unit in the previous layer is
changed. This crucial step allows back propagation to be applied to multilayer
networks. When the activity of a unit in the previous layer changes, it affects the
activities of all the output units to which it is connected. So to compute the overall
effect on the error, we add together all these separate effects on output units. But
each effect is simple to calculate. It is the answer in step (iii) multiplied by the
weight on the connection to that output unit.

ij
j
j
i
j
j j i
i
W EI
y
x
x
E
y
E
EA

=
=

By using steps (ii) and (iv), we can convert the EAs of one layer of units into EAs for the
previous layer. This procedure can be repeated to get the EAs for as many previous layers
as desired. Once we know the EA of a unit, we can use steps (ii) and (iii) to compute the
EWs on its incoming connections.

References and Suggested Reading
Anderson, J. A. (2003). An Introduction to neural networks. Prentice Hall.
Cheng, B. and Titterington, D. M. (1994). Neural networks: A review from a statistical
perspective. Statistical Science, 9, 2-54.
Dewolf, E.D., and Francl, L.J., (1997). Neural network that distinguish in period of wheat
tan spot in an outdoor environment. Phytopathalogy, 87(1) pp 83-87.
Dewolf, E.D. and Francl, L.J. (2000) Neural network classification of tan spot and
stagonespore blotch infection period in wheat field environment. Phytopathalogy,
20(2), 108-113 .
Gaudart, J. Giusiano, B. and Huiart, L. (2004). Comparison of the performance of multi-
layer perceptron and linear regression for epidemiological data. Comput. Statist. &
Data Anal., 44, 547-70.
Hassoun, M. H. (1995). Fundamentals of Artificial Neural Networks. Cambridge: MIT
Press.
10
Hebb,D.O. (1949) The organization of behaviour: A Neuropsychological Theory, Wiley,
New York
Hopfield, J.J. (1982). Neural network and physical system with emergent collective
computational capabilities. In proceeding of the National Academy of
Science(USA) 79, pp 2554-2558.
Kaastra, I. and Boyd, M.(1996): Designing a neural network for forecasting financial and
economic time series. Neurocomputing, 10(3), pp 215-236 (1996)
Kumar, M., Raghuwanshi, N. S., Singh, R,. Wallender, W. W. and Pruitt, W. O. (2002).
Estimating Evapotranspiration using Artificial Neural Network. Journal of
Irrigation and Drainage Engineering, Vol. 128(4), pp. 224-233
Madhav, Kandala Venu (2003). Study of statistical modeling techniques in agricultural.
Ph.D. thesis thesis, IARI.
Masters, T. (1993). Practical neural network recipes in C++, Academic press, NewYork.
Mcculloch, W.S. and Pitts, W. (1943) A logical calculus of the ideas immanent in nervous
activity. Bull. Math. Biophy., 5, pp 115-133
Pal, S. Das, J. Sengupta, P. and Banerjee, S. K. (2002). Short term prediction of
atmospheric temperature using neural networks. Mausam, 53, 471-80
Patterson, D. (1996). Artificial Neural Networks. Singapore: Prentice Hall.
Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage ang
organization in the brain. Psychological review, 65, pp 386-408.
Rumelhart, D.E., Hinton, G.E and Williams, R.J. (1986). Learning internal representation
by error propagation, in Parallel distributed processing: Exploration in
microstructure of cognition, Vol. (1) ( D.E. Rumelhart, J.L. McClelland and the
PDP research gropus, edn.) Cambridge, MA: MIT Press, pp 318-362.
Saanzogni, Louis and Kerr, Don (2001) Milk production estimate using feed forward
artificial neural networks. Computer and Electronics in Agriculture, 32, pp 21-30.
Schalkoff, R. J. (1997). Artificial neural networks. The Mc Graw-Hall
Warner, B. and Misra, M. (1996). Understanding neural networks as statistical tools.
American Statistician, 50, 284-93.
Widrow, B. and Hoff, M.E. (1960). Adapative switching circuit. IREWESCON convention
record, 4, pp 96-104
Yegnanarayana, B. (1999). Artificial Neural Networks. Prentice Hall
Zhang, G., Patuwo, B. E. and Hu, M. Y. (1998). Forecasting with artificial neural
networks: The state of the art. International Journal of Forecasting, 14, 35-62.

Artificial Neural Network

Uploaded by

Copyright:

Available Formats

Artificial Neural Network

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Artificial Neural Network

Uploaded by

Copyright:

Available Formats

ARTIFICIAL NEURAL NETWORKS

GIRISH KUMAR JHA

You might also like