A R T I F I C I A L N e U R A L N e T W o R K S
A R T I F I C I A L N e U R A L N e T W o R K S
A R T I F I C I A L N e U R A L N e T W o R K S
4.1 INTRODUCTION
The brain stores information as patterns. Some of these patterns are very
complicated and allow us the ability to recognize individual faces from many
different angles. This process of storing information as patterns, utilizing those
patterns, and then solving the problems encompasses a new field in computing,
which does not utilize traditional programming but involves the creation of
massively parallel networks and the training of those networks to solve specific
problems.
The exact workings of the human brain are still a mystery, yet some aspects
are known. The most basic element of the human brain is a specific type of cell,
called ‘neuron’. These neurons provide the abilities to remember, think, and apply
previous experiences to our every action. They are about 100 billion in number and
each of these neurons connects itself with about 200,000 other neurons, although
1,000 to 10,000 is typical. The power of the human mind comes from the sheer
numbers of these basic components and the multiple connections between them. It
also comes from genetic programming and learning.
The individual neurons are complicated. They have a myriad of parts, sub-
systems and control mechanisms. They convey information via a host of
74
electrochemical pathways. Together, these neurons and their connections form a
process, which is not binary, not stable, and not synchronous.
Within humans there are many variations on basic type of neuron, yet, all
biological neurons have the same four basic components. They are known by their
biological names – cell body (soma), dendrites, axon, and synapses.
Cell body (Soma): The body of neuron cell contains the nucleus and carries out
biochemical transformation necessary to the life of neurons.
Dendrite: Each neuron has fine, hair like tubular structures (extensions) around it.
They branch out into tree around the cell body. They accept incoming signals.
Axon: It is a long, thin, tubular structure which works like a transmission line.
Synapse: Neurons are connected to one another in complex spatial arrangement.
When axon reaches its final destination it branches again called as terminal
arborization. At the end of axon are highly complex and specialized structures called
synapses. Connection between two neurons takes place at these synapses.
75
Dendrites receive the input through the synapses of other neurons. The soma
processes these incoming signals over time and converts that processed value into
an output, which is sent out to other neurons through the axon and the synapses.
The artificial neuron simulates four basic functions of a biological neuron. Fig.
4.2 shows basic representation of an artificial neuron.
In Fig. 4.2, various inputs to the network are represented by the mathematical
symbol, x(n). Each of these inputs is multiplied by a connection weight. The weights
are represented by w(n). In the simplest case, these products are summed, fed to a
transfer function (activation function) to generate a result, and this result is sent as
output. This is also possible with other network structures, which utilize different
summing functions as well as different transfer functions.
76
binary properties of ORing and ANDing of inputs along with summing operations.
Such functions can be built into the summation and transfer functions of a network.
The summation function can be more complex than just weight sum of
products. The input and weighting coefficients can be combined in many different
ways before passing on to the transfer function. In addition to summing, the
summation function can select the minimum, maximum, majority, product or
several normalizing algorithms. The specific algorithm for combining neural inputs
is determined by the chosen network architecture and paradigm. Some summation
functions have an additional ‘activation function’ applied to the result before it is
passed on to the transfer function for the purpose of allowing the summation output
to vary with respect to time.
77
Component 3. Transfer Function: The result of the summation function is
transformed to a working output through an algorithmic process known as the
transfer function. In the transfer function the summation can be compared with
some threshold to determine the neural output. If the sum is greater than the
threshold value, the processing element generates a signal and if it is less than the
threshold, no signal (or some inhibitory signal) is generated. Both types of response
are significant. The threshold, or transfer function, is generally non-linear. Linear
functions are limited because the output is simply proportional to the input.
The step type of transfer function would output zero and one, one and minus
one, or other numeric combinations. Another type, the ‘threshold’ or ramping
function, can mirror the input within a given range and still act as a step function
outside that range. It is a linear function that is clipped to minimum and maximum
values, making it non-linear. Another option is a ‘S’ curve, which approaches a
minimum and maximum value at the asymptotes. It is called a sigmoid when it
ranges between 0 and 1, and a hyperbolic tangent when it ranges between -1 and 1.
Both the function and its derivatives are continuous
Component 4. Scaling and Limiting: After the transfer function, the result can pass
through additional processes, which scale and limit. This scaling simply multiplies a
scale factor times the transfer value and then adds an offset. Limiting is the
mechanism which insures that the scaled result does not exceed an upper, or lower
bound. This limiting is in addition to the hard limits that the original transfer
function may have performed.
78
processing elements unless they have great strength. Competition can occur at one
or both levels. First, competition determines which artificial neuron will be active or
provides an output. Second, competitive inputs help determine which processing
element will participate in the learning or adaptation process.
79
4.4 AN ARTIFICIAL NEURAL NETWORK
Fig. 4.3 shows an artificial neural network. Inputs enter into the processing
element from the upper left. The first step is to multiply each of these inputs by their
respective weighting factor [w(n)]. These modified inputs are then fed into the
summing function, which usually sums these products, however, many different
types of operations can be selected. These operations can produce a number of
different values, which are then propagated forward; values such as the average, the
largest, the smallest, the ORed values, the ANDed values, etc. Other types of
summing functions can also be created and sometimes they may be further
complicated by the addition of an activation function which enables the summing
function to operate in a time sensitive way.
The output of the summing function is then sent into a transfer function,
which turns this number into a real output (a 0 or a 1, -1 or +1 or some other
number) via some algorithm. The transfer function can also scale the output or
80
control its value via thresholds. This output is then sent to other processing elements
or an outside connection, as dictated by the structure of the network.
The transfer function for neural networks must be differential and therefore
continuous to enable correcting error. Derivative of the transfer function is required
for computation of local gradient. One such example of a suitable transfer function is
the sigmoid function. The sigmoid function is a S-shaped graph. It is one of the most
common forms of transfer function used in construction of ANNs. It is defined as a
strictly increasing function. Mathematically its derivative is always positive. It
exhibits a graceful balance between linear and nonlinear behavior. One example of it
1
is a logistic function represented by the equation: φ (v) =
1+ e
This function has certain characteristic. At extremes of ϕ(v): ϕ (v) is flat and ϕ’(v) is
very small. At midrange of ϕ (v): ϕ’(v) is maximum as seen in Fig. 4.4.
81
Several other transfer functions can also be employed as shown in Fig. 4.5.
The neurons can be clustered together in many ways. This clustering occurs
in the human mind in such a way that information can be processed in a dynamic,
interactive, and self-organizing way. Biologically, neural networks are constructed
in a three-dimensional world from microscopic components. These neurons seem
capable of nearly unrestricted interconnections. That is not true of any existing man-
made network. Neural networks are the simple clustering of artificial neurons by
creating layers and interconnections as shown in Fig. 4.6.
82
Fig. 4.6: Layer arrangement in a neural network
In most networks, each neuron in a hidden layer receives the signals from all
the neurons typically from the input layer. After a neuron performs its function, it
passes its output to all of the neurons from typically the output layer, providing a
feed-forward path. This gives a variable strength to an input. There are two types of
these connections. One causes the summing mechanism of the next neuron to add
(excite) while the other causes it to subtract (inhibit). Some networks want a neuron
to inhibit the other neurons in the same layer, called ‘lateral inhibition’. The most
common use of this is in the output layer, e.g., in text recognition, if the probability
of a character being ‘P’ is 0.85 and if the same being ‘F’ is 0.65, the network wants to
83
choose the highest probability and inhibit all the others. It can do that with lateral
inhibition (competition).
A neural network in which the input layer of source nodes projects into an output
layer of neurons but not vice-versa is known as single feed-forward or acyclic
network. In single layer network, ‘single layer’ refers to the output layer of
computation nodes as shown in Fig. 4.7.
This type of network (Fig. 4.8) consists of one or more hidden layers, whose
computation nodes are called hidden neurons or hidden units. The function of
hidden neurons is to interact between the external input and network output in
some useful manner and to extract higher order statistics. The source nodes in input
layer of network supply the input signal to neurons in the second layer (1st hidden
layer). The output signals of 2nd layer are used as inputs to the third layer and so on.
The set of output signals of the neurons in the output layer of network constitutes
84
the overall response of network to the activation pattern supplied by source nodes in
the input first layer.
A feed forward neural network having one or more hidden layers with atleast
one feedback loop is known as recurrent network as shown in Fig. 4.9. The feedback
may be a self feedback, i.e., where output of neuron is fed back to its own input.
Sometimes, feedback loops involve the use of unit delay elements, which results in
nonlinear dynamic behaviour, assuming that neural network contains non linear
units.
85
Input layer Output Layer
A recurrent neural network has (at least one) cyclic path of synaptic
connections. Basic characteristics:
1. all biological neural networks are recurrent
2. mathematically, they implement dynamical systems
3. several types of training algorithms are known, no clear winner
4. theoretical and practical difficulties by and large have prevented practical
applications so far.
Once a network has been structured for a particular application, it is ready for
training. At the beginning, the initial weights are chosen randomly and then the
training or learning begins. There are two approaches to training; supervised and
unsupervised.
86
4.8.1 SUPERVISED TRAINING
In supervised training, both the inputs and the outputs are provided. The
network then processes the inputs and compares its resulting outputs against the
desired outputs. Errors are then propagated back through the system, causing the
system to adjust the weights, which control the network. This process occurs over
and over as the weights are continually tweaked. The set of data, which enables the
training, is called the "training set." During the training of a network, the same set of
data is processed many times, as the connection weights are ever refined.
Sometimes a network may never learn. This could be because the input data
does not contain the specific information from which the desired output is derived.
Networks also don't converge if there is not enough data to enable complete
learning. Ideally, there should be enough data so that part of the data can be held
back as a test. Many layered networks with multiple nodes are capable of
memorizing data. To monitor the network to determine if the system is simply
memorizing its data in some non-significant way, supervised training needs to hold
back a set of data to be used to test the system after it has undergone its training.
If a network simply can't solve the problem, the designer then has to review
the input and outputs, the number of layers, the number of elements per layer, the
connections between the layers, the summation, transfer, and training functions, and
even the initial weights themselves. Another part of the designer's creativity governs
the rules of training. There are many laws (algorithms) used to implement the
adaptive feedback required to adjust the weights during training. The most common
technique is known as back-propagation.
The training is not just a technique, but a conscious analysis, to insure that the
network is not over trained. Initially, an artificial neural network configures itself
87
with the general statistical trends of the data. Later, it continues to ‘learn’ about
other aspects of the data, which may be spurious from a general viewpoint.
When finally the system has been correctly trained and no further learning is
needed, the weights can, if desired, be ‘frozen’. In some systems, this finalized
network is then turned into hardware so that it can be fast. Other systems don't lock
themselves in but continue to learn while in production use.
The other type is the unsupervised training (learning). In this type, the
network is provided with inputs but not with desired outputs. The system itself
must then decide what features it will use to group the input data. This is often
referred to as self-organization or adaption. These networks use no external
influences to adjust their weights. Instead, they internally monitor their
performance. These networks look for regularities or trends in the input signals, and
makes adaptations according to the function of the network. Even without being
told whether it's right or wrong, the network still must have some information about
how to organize itself. This information is built into the network topology and
learning rules. An unsupervised learning algorithm might emphasize cooperation
among clusters of processing elements. In such a scheme, the clusters would work
together. If some external input activated any node in the cluster, the cluster's
activity as a whole could be increased. Likewise, if external input to nodes in the
cluster was decreased, that could have an inhibitory effect on the entire cluster.
88
updated. Presently, the unsupervised learning is not well understood and there
continues to be a lot of research in this aspect.
The rate at which ANNs learn depends upon several controllable factors. A
slower rate means more time to spend in producing an adequately trained system.
With faster learning rates, however, the network may not be able to make the fine
discriminations that are possible with a system learning slowly.
Most learning functions have some provision for a learning rate (learning
constant). Usually this term is positive and between 0 and 1. If the learning rate is
greater than 1, it is easy for the learning algorithm to overshoot in correcting the
weights, and the network will oscillate. Small values of the learning rate will not
correct the current error as quickly, but if small steps are taken in correcting errors,
there is a good chance of arriving at the best minimum convergence.
Many learning laws are in common use. Most of them are some sort of
variation of the best known and oldest ‘Hebb's Rule’.
Hebb's Rule: This was introduced by Donald Hebb in ‘Organization of Behavior’.
The basic rule is: If a neuron receives an input from another neuron and if both are
highly active (same sign), the weight between the two neurons should be
strengthened.
Hopfield Law: If the desired output and the input are both active or both inactive,
increment the connection weight by the learning rate, otherwise decrement the
weight by the learning rate.
89
The Delta Rule: This rule is based on the simple idea of continuously modifying the
strengths of the input connections to reduce the difference (the delta) between the
desired output value and the actual output of a processing element.
The Gradient Descent Rule: This is similar to Delta Rule in that, the derivative of
the transfer function is still used to modify the delta error before it is applied to the
connection weights. However, an additional proportional constant tied to the
learning rate is appended to the final modifying factor acting upon the weight.
Kohonen's Law: In this, the processing elements compete for the opportunity to
learn or update their weights. The element with largest output is declared the
winner and has the capability of inhibiting its competitors as well as exciting its
neighbors. Only the winner is permitted an output and only the winner plus its
neighbors are allowed to adjust their connection weights.
an output value). The training data for a feedforward network training task consist
of T input-output (vector-valued) data pairs
u (n) = ( x10 (n),..., xK0 (n)) t , d (n) = (d1k +1 (n),..., d Lk +1 (n)) t ,
where ‘n’ denotes training instance. The activation of non-input units is computed
according to
90
xim+1 (n) = f ( wijm x j (n)).
j =1,..., N m
Presented with training input u(t), the previous update equation is used to
compute activations of units in subsequent hidden layers, until a network
response y (n) = ( x1k +1 (n),..., xLK +1 (n)) t is obtained in the output layer. The objective of
training is to find a set of network weights such that the summed squared error
2
E= d ( n) − y ( n ) = E (n) is minimized. This is done by incrementally
n =1,...,T n =1,....T
changing the weights along the direction of the error gradient with respect to
∂E ∂E (n)
weights = using a (small) learning rate γ:
∂wij t =1,....T ∂wijm
m
∂E
new wijm = wijm − γ .
∂wijm
This is the formula used in batch learning mode, where new weights are
computed after presenting all training samples. One such pass through all samples
is called an epoch. Before the first epoch, weights are initialized, typically to small
random numbers. A variant is incremental learning, where weights are changed
after presentation of individual training samples:
∂E (n)
new wijm = wijm − γ
∂wijm
∂E (n)
The subtask in this method is the computation of the error gradients .
∂wijm
91
2. Compute, by proceeding backward through m = k+1, k, ..., 1, for each unit xim the
∂f (u )
δ ik +1 (n) = (d i (n) − yi (n))
∂u u = zik +1
is the internal state (or potential) of unit xim . This is the error backpropagation pass.
Mathematically, the error propagation term δ im (n) represents the error gradient
∂E
∂u u = z mj
After every such epoch, compute the error. Stop when the error falls below a
predetermined threshold or when the change in error falls below another
predetermined threshold or when the number of epochs exceeds a predetermined
maximal number of epochs. Many (order of thousands in nontrivial tasks) such
epochs may be required until a sufficiently small error is achieved. One epoch
requires O(T M) multiplications and additions, where M is the total number of
network connections.
92
The basic gradient descent approach (and its backpropagation algorithm
implementation) is notorious for slow convergence, because the learning rate γ must
be typically chosen small to avoid instability. Another approach to achieve faster
convergence is to use second-order gradient descent techniques, which exploit
2
curvature of the gradient but have epoch complexity O(T M ).
Many of the networks being designed presently are statistically quite accurate
(upto 85% to 90% accuracy). Currently, neural networks are not the user interface,
which translates spoken words into instructions for a machine but some day it will
be achieved. VCRs, home security systems, CD players, and word processors will
simply be activated by voice. Touch screen and voice editing will replace the word
processors of today while bringing spreadsheets and data bases to a level of
usability. Neural network design is progressing in other more promising application
areas.
(i) Language Processing: These applications include text-to-speech conversion,
auditory input for machines, automatic language translation, secure voice keyed
locks, automatic transcription, aids for the deaf, aids for the physically disabled
which respond to voice commands and natural language processing.
(ii) Character Recognition: Neural network based products are available which can
recognize hand printed characters through a scanner. It is 98% accurate for numbers,
a little less for alphabetical characters. Quantum Neural Network software package
(Qnspec) is available for recognizing characters, including cursive.
(iii) Image (data) Compression: Neural networks can do real-time compression and
decompression of data. These networks can reduce eight bits of data to three and
then reverse that process upon restructuring to eight bits again.
93
(iv) Pattern Recognition: Many pattern recognition applications are in use like, a
system that can detect bombs in luggage at airports by identifying from small
variances and patterns from within specialized sensor's outputs, a back-propagation
neural network which can discriminate between a true and a false heart attack, a
network which can scan and also read the PAP smears etc. Many automated quality
control applications are now in use, which are based on pattern recognition.
(v) Signal Processing: Neural networks have proven capable of filtering out
electronic noise. Another application is a system that can detect engine misfire
simply from the engine sound.
(vi) Financial: Banks, credit card companies and lending institutions deal with many
decisions that are not clear-cut. They involve learning and statistical trends. Neural
networks are now trained on the data from past decisions and being used in
decision-making.
94
be possible by solving the model analytically. The rising interest in ANNs is largely
due to the emergence of powerful new methods as well as to the availability of
computational power suitable for simulation. The field is particularly exciting today
because ANN algorithms and architectures can be implemented in VLSI technology
for real-time applications.
The application of ANNs in many areas under electrical power systems has
lead to acceptable results.
1. Load Forecasting
Load forecasting is a very common and popular problem, which has an
important role in economic, financial, development, expansion and planning of
power systems.
• Short-term load forecasting over an interval ranging from an hour to a week is
important for various applications such as unit commitment, economic dispatch,
energy transfer scheduling and real time control.
• Mid-term load forecasting that range from one month to five years, is used to
purchase enough fuel for power plants after electricity tariffs are calculated.
• Long-term load forecasting (LTLF), covering from 5 to 20 years or more, is used by
planning engineers and economists to determine the type and the size of generating
plants that minimize both fixed and variable costs.
The ANNs can be used to solve these problems. Most of the projects using
ANNs have considered many factors such as weather condition, holidays, weekends
and sports matches days in forecasting model successfully. This is because of
learning ability of ANNs with many input factors. Main advantages of ANNs that
has increased their use in forecasting are:
1. Being conducted off-line without time constraints and direct coupling to power
system for data acquisition.
95
2. Ability to adjust the parameters for ANN inputs that does not have functional
relationship between them such as weather conditions and load profile.
96
3. Economic Dispatch
Main goal of economic dispatch (ED) consists of minimizing the operating
costs depending on demand and subject to certain constraints, i.e. how to allocate
the required load demand between the available generation units. In practice, the
whole of the unit operating range is not always available for load allocation due to
physical operation limitations. Several methods like Lagrangian relaxation method,
linear programming techniques, Beale’s quadratic programming, Newton method,
Lagrangian augmented function etc. are being used. The economic dispatch problem
is a nonconvex optimization problem. Neural networks and specially the Hopfield
model, have a well-demonstrated capability of solving combinational optimization
problem.
4. Security Assessment
The principle task of an electric power system is to deliver the power
requested by the customers, without exceeding acceptable voltage and frequency
limits. This task has to be solved in real time and in safe, reliable and economical
manner. Fig. 4.10 show a simplified diagram of the principle data flow in a power
system where real-time measurements are stored in a database. The state estimation
then adjusts bad and missing data. Based on the estimated values the current
mathematical model of the power system is established. Based on simulation of
potential equipment outage, the security level of the system is determined. If the
system is considered unsafe with respect to one or more potential outages, control
actions have to be taken.
97
Fig. 4.10: Data flow in power System Operation
Generally there are two types of security assessments: static and dynamic. In
both types different operational states are defined as follows:
• Normal or secure state: All customer demands are met and operating limit is
within presented limits.
• Alert or critical state: The system variables are still within limits and constrain are
satisfied, but little disturbance can lead to variable toward instability.
• Emergency or unsecure state: the power system enters the emergency mode of
operation upon violation of security related inequality constraints.
5. Alarm Processing
In emergencies, engineers are expected to quickly evaluate various options
and implement an optimal corrective action. However, the number of real-time
messages (alarms) received is too large for the time available for their evaluation.
Processing such alarms in real-time and alerting the operator to the root cause or the
98
most important of these alarms has been identified as a valuable operational aid.
ANNs have been implemented for such alarm processing. The fast response of a
trained ANN and its generalization abilities are very useful in this application.
99
• Load frequency control (Automatic Generation Control)
• Hydroelectric Generation Scheduling:
• Power System Stabilizer Design:
• Load Flow studies
• Load modeling
• HVDC
• Non conventional energy field
• Maintenance scheduling
• Unit commitment
Main advantages of using ANNs in power system applications are:
• Their capability of dealing with stochastic variations of the scheduled operating
point with increasing data
• Very fast and on-line processing and classification
• Implicit nonlinear modeling and filtering of system data
The artificial neural networks have been dealt with in details in this chapter.
In this research work, the ANN controllers based on optimal and suboptimal control
strategy have been designed and developed for various types of interconnected
power systems, the procedure for which has been discussed in chapter 5.
Since these ANN controllers have been trained with a data obtained from
optimal and suboptimal control strategies, they have given the results comparable to
that of optimal and suboptimal control. The actual results have been shown and
discussed in chapter 6.
100