Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

UNIT-1 Notes Introduction of Soft Computing:: Page No: 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 66

Downloaded from be.rgpvnotes.

in

UNIT-1 Notes

Introduction of soft computing:


Soft computing, as opposed to traditional computing, deals with approximate models and gives
solutions to complex real-life problems. Soft computing is tolerant of imprecision, uncertainty,
partial truth, and approximations. In effect, the role model for soft computing is the human
mind.
The principal constituents of Soft Computing (SC) are Fuzzy Logic (FL), Neural Computing (NC),
Evolutionary Computation (EC) Machine Learning (ML) and Probabilistic Reasoning (PR), with
the latter subsuming belief networks, chaos theory and parts of learning theory.

Soft computing vs. hard computing:


Following points clearly differentiate the both:
1. Soft Computing is tolerant of imprecision, uncertainty, partial truth and approximation
whereas Hard Computing requires a precisely state analytic model.
2. Soft Computing is based on fuzzy logic, neural sets, and probabilistic reasoning whereas hard
Computing is based on binary logic, crisp system, numerical analysis and crisp software.
3. Soft computing has the characteristics of approximation and dispositional whereas hard
computing has the characteristics of precision and categorical.
4. Soft computing can evolve its own programs whereas hard computing requires programs to
be written.
5. Soft computing can use multi valued or fuzzy logic whereas hard computing uses two-valued
logic.

Various types of soft computing techniques:


Soft computing is a fusion of three different methodologies as follows:
Soft computing = Fuzzy Logic + Neural Network + Genetic Algorithm
Brief details about above mentioned techniques are as follows:
1. Fuzzy Logic:
Fuzz logi is i te ded to odel logi al easo i g ith ague o i p e ise state e ts like Pet
is ou g i h, tall, hu g , et . . It efe s to a fa il of a -valued logics (see entry on many-
valued logic) and thus stipulates that the truth value (which, in this case amounts to a degree of
t uth of a logi all o pou d p opositio , like Ca les is tall a d Ch is is i h , is dete i ed
the truth value of its components. In other words, like in classical logic, one imposes truth-
functionality.
Fuzzy logic emerged in the context of the theory of fuzzy sets, introduced by Zadeh (1965). A
fuzzy set assigns a degree of membership, typically a real number from the interval [0,1] , to
elements of a universe. Fuzzy logic arises by assigning degrees of truth to propositions. The
sta da d set of t uth alues deg ees is [ , ], he e ep ese ts totall false , ep ese ts
totall t ue , a d the othe u e s efe to pa tial t uth, i.e., intermediate degrees of truth.

Page no: 1 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

Fuzz logi is ofte u de stood i a e ide se se hi h i ludes all ki ds of fo alis s


and techniques referring to the systematic handling of degrees of some kind. In particular in
engineering contexts (fuzzy control, fuzzy classification, soft computing) it is aimed at efficient
computational methods tolerant to sub optimality and imprecision. It focuses on logics based
on a truth-functional account of partial truth and studies them in the spirit of classical
mathematical logic (syntax, model theoretic semantics, proof systems, completeness, etc.;
both, at propositional and the predicate level).

2. Neural Network:
Neural Networks (NNs) are also known as Artificial Neural Networks (ANNs), Connectionist
Models, and Parallel Distributed Processing (PDP) Models. Artificial Neural Networks are
massively parallel interconnected networks of simple (usually adaptive) elements and their
hierarchical organizations which are intended to interact with the objects of the real world in
the same way as biological nervous systems do. Fine-grained, parallel, distributed computing
model characterized by:

 A large number of very simple, neuron-like processing elements called units, PEs, or
nodes
 A large number of weighted, directed connections between pairs of units
 Weights may be positive or negative real values
 Local processing in that each unit computes a function based on the outputs of a limited
number of other units in the network
 Each unit computes a simple function of its input values, which are the weighted
outputs from other units. If there are n inputs to a unit, then the unit's output, or
activation is defined by a = g((w1 * x1) + (w2 * x2) + ... + (wn * xn)). Thus each unit
computes a (simple) function g of the linear combination of its inputs.

3. Genetic Algorithm:
Genetic algorithm is basically a randomized search by simulating evolution, starting from an
initial set of solutions or hypotheses, and generating successive "generations" of solutions. This
particular branch of AI was inspired by the way living things evolved into more successful
organisms in nature. The main idea is survival of the fittest, a.k.a. natural selection.
A chromosome is a long, complicated thread of DNA (deoxyribonucleic acid). Hereditary factors
that determine particular traits of an individual are strung along the length of these
chromosomes, like beads on a necklace. Each trait is coded by some combination of DNA (there
are four bases, A (Adenine), C (Cytosine), T (Thymine) and G (Guanine). Like an alphabet in a
language, meaningful combinations of the bases produce specific instructions to the cell.
Changes occur during reproduction. The chromosomes from the parents exchange randomly by
a process called crossover. Therefore, the offspring exhibit some traits of the father and some
traits of the mother.
A rarer process called mutation also changes some traits. Sometimes an error may occur during
copying of chromosomes (mitosis). The parent cell may have -A-C-G-C-T- but an accident may
occur and changes the new cell to -A-C-T-C-T-. Much like a typist copying a book, sometimes a
few mistakes are made. Usually this results in a nonsensical word and the cell does not survive.

Page no: 2 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

But over millions of years, sometimes the accidental mistake produces a more beautiful phrase
for the book, thus producing a better species.

Applications of soft computing


The applications of Soft Computing have proved two main advantages:
1. First, in solving nonlinear problems, where mathematical models are not available, or
not possible.
2. Second, introducing the human knowledge such as cognition, recognition, understanding,
learning, and others into the fields of computing.
This resulted in the possibility of constructing intelligent systems such as autonomous self
tuning systems, and automated designed systems.
The relevance of soft computing for pattern recognition and image processing is
already established during the last few years. Soft computing has recently gained importance
because of its potential applications in problems like:
- Remotely Sensed Data Analysis,
- Data Mining, Web Mining,
- Global Positioning Systems,
- Medical Imaging,
- Forensic Applications,
- Optical Character Recognition,
- Signature Verification,
- Multimedia,
- Target Recognition,
- Face Recognition and
- Man Machine Communication.

Neural Network: Structure of a single Biological neuron:


Artificial NN draw much of their inspiration from the biological nervous system. It is therefore
very useful to have some knowledge of the way this system is organized.
Most living creatures, which have the ability to adapt to a changing environment, need a
controlling unit which is able to learn. Higher developed animals and humans use very complex
networks of highly specialized neurons to perform this task.
The control unit - or brain - can be divided in different anatomic and functional sub-units, each
having certain tasks like vision, hearing, motor and sensor control. The brain is connected by
nerves to the sensors and actors in the rest of the body.
The brain consists of a very large number of neurons, about 1011 in average. These can be seen
as the basic building bricks for the central nervous system (CNS). The neurons are
interconnected at points called synapses. The complexity of the brain is due to the massive
number of highly interconnected simple units working in parallel, with an individual neuron
receiving input from up to 10000 others.
The neuron contains all structures of an animal cell. The complexity of the structure and of the
processes in a simple cell is enormous. Even the most sophisticated neuron models in artificial
neural networks seem comparatively toy-like.

Page no: 3 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

Structurally the neuron can be divided in three major parts: the cell body (soma), the dendrites,
and the axon, see Figure 1.1 for an illustration:

Fig 1.1: Biological Neuron Structure

The cell body contains the organelles of the neuron and also the `dendrites' are originating
there. These are thin and widely branching fibers, reaching out in different directions to make
connections to a larger number of cells within the cluster.

Input connection is made from the axons of other cells to the dendrites or directly to the body
of the cell. These are known as synapses.
There is only one axon per neuron. It is a single and long fiber, which transports the output
signal of the cell as electrical impulses (action potential) along its length. The end of the axon
may divide in many branches, which are then connected to other cells. The branches have the
function to fan out the signal to many other inputs.
There are many different types of neuron cells found in the nervous system. The differences are
due to their location and function.

Function of a Biological neuron:


The neurons perform basically the following function: all the inputs to the cell, which may vary
by the strength of the connection or the frequency of the incoming signal, are summed up. The
input sum is processed by a threshold function and produces an output signal.
The brain works in both a parallel and serial way. The parallel and serial nature of the brain is
readily apparent from the physical anatomy of the nervous system. That there is serial and
parallel processing involved can be easily seen from the time needed to perform tasks. For
example a human can recognize the picture of another person in about 100 ms. Given the
processing time of 1 ms for an individual neuron this implies that a certain number of neurons,
but less than 100, are involved in serial; whereas the complexity of the task is evidence for a
parallel processing, because a difficult recognition task cannot be performed by such a small
number of neuron. This phenomenon is known as the 100-step-rule.
Biological neural systems usually have a very high fault tolerance. Experiments with people with
brain injuries have shown that damage of neurons up to a certain level does not necessarily

Page no: 4 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

influence the performance of the system, though tasks such as writing or speaking may have to
be learned again. This can be regarded as re-training the network.
In the following work no particular brain part or function will be modeled. Rather the
fundamental brain characteristics of parallelism and fault tolerance will be applied.

Artificial Neuron and Definition of ANN


Artificial Neural Networks are the biologically inspired simulations performed on the computer
to perform certain specific tasks like clustering, classification, pattern recognition etc. Artificial
Neural Net o ks, i ge e al — is a iologi all i spi ed et o k of a tifi ial eu o s o figu ed
to perform specific tasks.
An Artificial Neural Network (ANN) is an information processing paradigm that is inspired by the
way biological nervous systems, such as the brain, process information. The key element of this
paradigm is the novel structure of the information processing system. It is composed of a large
number of highly interconnected processing elements (neurons) working in unison to solve
specific problems. ANNs, like people, learn by example. An ANN is configured for a specific
application, such as pattern recognition or data classification, through a learning process.
Learning in biological systems involves adjustments to the synaptic connections that exist
between the neurons. This is true of ANNs as well.

Function of ANN:
Following points describes the function of ANN:

Fig 1.2: Artificial Neuron Structure

-Artificial neural networks can be viewed as weighted directed graphs in which artificial neurons
are nodes and directed edges with weights are connections between neuron outputs and
neuron inputs see figure 1.2.
-The Artificial Neural Network receives input from the external world in the form of pattern and
image in vector form. These inputs are mathematically designated by the notation x(n) for n
number of inputs.

Page no: 5 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

-Each input is multiplied by its corresponding weights. Weights are the information used by the
neural network to solve a problem. Typically weight represents the strength of the
interconnection between neurons inside the neural network.
-The weighted inputs are all summed up inside computing unit (artificial neuron). In case the
weighted sum is zero, bias is added to make the output not- zero or to scale up the system
response. Bias has the eight a d i put al a s e ual to .
-The sum corresponds to any numerical value ranging from 0 to infinity. In order to limit the
response to arrive at desired value, the threshold value is set up. For this, the sum is passed
through activation function.
-The activation function is set of the transfer function used to get desired output. There are
linear as well as the non-linear activation function. Some of the commonly used activation
fu tio s a e —  i a a d sig oidal fu tio s o li ea .
1. Bi a  — the output has o l t o alues eithe o . Fo this, the th eshold alue is set up. If
the net weighted input is greater than 1, an output is assumed 1 otherwise zero.
. “ig oidal H pe oli  — this fu tio has “ shaped u e. He e ta h pe olic function is
used to app o i ate output f o et i put. The fu tio is defi ed as — f = / + e p -�x))
where � — steepness parameter.

Taxonomy of neural net:


Figure 1.3 classifies neural network taxonomy as:

Page no: 6 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

Fig 1.3: Taxonomy of neural network

Descriptions of each neural network taxonomy are as follows:

. Pe ept o  — Neu al Net o k ha i g t o i put u its a d o e output u its ith o hidde


la e s. These a e also k o as si gle la e pe ept o s.

. ‘adial Basis Fu tio Net o k — These networks are similar to the feed forward neural
network except radial basis function is used as activation function of these neurons.

. Multila e Pe ept o  — These et o ks use o e tha o e hidde la e of eu o s, u like


single layer perceptron. These are also known as deep feedforward neural networks.

. ‘e u e t Neu al Net o k — T pe of eu al et o k i hi h hidde la e eu o s has self-


connections. Recurrent neural networks possess memory. At any instance, hidden layer neuron
receives activation from the lower layer as well as it previous activation value.

. Lo g /“ho t Te Me o Net o k L“TM  — T pe of eu al et o k i hi h e o ell is


incorporated inside hidden layer neurons is called LSTM network.

. Hopfield Net o k — A fully interconnected network of neurons in which each neuron is
connected to every other neuron. The network is trained with input pattern by setting a value
of neurons to the desired pattern. Then its weights are computed. The weights are not
changed. Once trained for one or more patterns, the network will converge to the learned
patterns. It is different from other neural networks.

. Boltz a Ma hi e Net o k — These et o ks a e si ila to Hopfield et o k e ept so e


neurons are input, while other are hidden in nature. The weights are initialized randomly and
learn through back propagation algorithm.

. Co olutio al Neu al Net o k — Get a o plete o e ie of Co olutio al Neu al Net o ks


through our blog Log Analytics with Machine Learning and Deep Learning.

Difference between ANN and human brain:


Following points differentiate ANN with Human brain as follows:

Characteristics Artificial Neural Network Biological(Real) Neural Network

Faster in processing information. Slower in processing information. The


Speed
Response time is in nanoseconds. response time is in milliseconds.

Processing Serial processing. Massively parallel processing.

Size & Less size & complexity. It does not Highly complex and dense network of

Page no: 7 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

Complexity perform complex pattern interconnected neurons containing


recognition tasks. neurons of the order of 1011 with 1015 of
interconnections.

Highly complex and dense network of


Information storage is replaceable
interconnected neurons containing
Storage means new data can be added by
neurons of the order of 1011 with 1015 of
deleting an old one.
interconnections.

Information storage is adaptable means


Fault intolerant. Information once
Fault new information is added by adjusting the
corrupted cannot be retrieved in
tolerance interconnection strengths without
case of failure of the system.
destroying old information

Control There is a control unit for No specific control mechanism external to


Mechanism controlling computing activities the computing task.

Characteristics of ANN:
Following are the characteristics of ANN:
1. Cognitive architecture
1.1 Sensory store
Activity states of neurons are organized with the help of excitatory and inhibitory signals
exchanged between them. Sensory information along with stored data results in
convergence to a stationary state which lasts a fraction of a second. (Sensory store of time-
scale tuned to typical progression of events in real life)
1.2 Short-term memory = firing of neurons
1.3 Long-term memory = synaptic strength
LTM is reorganized with the help of synaptic plasticity, e.g. Hebb's rule - connections are
strengthened when both neurons are active together. The effect is to increase the likelihood
of those activity states that occurred in the past.
1.4 Associative Memory (found in drosophila and snails) - an optimization problem.
1.5 Content-Addressability of Memory
1.6 Constraint Satisfaction Processing (different from search-based techniques of classical AI)
2. Not so good at deductive logic
3. Performance improves with practice
4. Possibility of unsupervised and supervised learning; ability to use external feedback on the
desirability of present activity state
5. Dealing with uncertainty, imprecision, noise - imitates human flexibility.
6. Spontaneous generalization - classification by similarity rather than by property lists. Use of
prototypes is seen in humans in categorization tasks calling for say: a list of examples of birds,
response times to verify ___ is a bird and checking defaults in comprehension - I saw a ``bird''
on the grass.

Page no: 8 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

7. Might offer new explanations for some puzzling discontinuities in cognitive development.
8. Some philosophers are even hopeful that neural networks may exhibit patterns of firings of
neurons.

Applications of ANN:
ANN can be used in following areas:

Application Architecture / Algorithm Activation Function

Process modeling and


Radial Basis Network Radial Basis
control

Machine Diagnostics Multilayer Perceptron Tan- Sigmoid Function

Portfolio Management Classification Supervised Algorithm Tan- Sigmoid Function

Target Recognition Modular Neural Network Tan- Sigmoid Function

Medical Diagnosis Multilayer Perceptron Tan- Sigmoid Function

Logistic Discriminant Analysis with ANN,


Credit Rating Logistic function
Support Vector Machine

Targeted Marketing Back Propagation Algorithm Logistic function

Multilayer Perceptron, Deep Neural


Voice recognition Networks( Convolutional Neural Logistic function
Networks)

Financial Forecasting Backpropagation Algorithm Logistic function

Intelligent searching Deep Neural Network Logistic function

Gradient - Descent Algorithm and Least


Fraud detection Logistic function
Mean Square (LMS) algorithm.

Single layer network:


By connecting multiple neurons, the true computing power of the neural networks comes,
though even a single neuron can perform substantial level of computation. The most common
structure of connecting neurons into a network is by layers.
The simplest form of layered network is shown in figure 1.4. The shaded nodes on the left are in
the so-called input layer. The input layer neurons are to only pass and distribute the inputs and
perform no computation. Thus, the only true layer of neurons is the one on the right. Each of
the inputs $ x_1, x_2, x_3, ... , x_N is connected to every artificial neuron in the output layer

Page no: 9 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

through the connection weight. Since every value of outputs $ y_1, y_2, y_3, ... , y_N is
calculated from the same set of input values, each output is varied based on the connection
weights.

Fig 1.4: Single Layer ANN

Perceptron training algorithm


A perceptron receives multiple input signals, and if the sum of the input signals exceed a certain
threshold it either returns a signal or remains sile t othe ise. What ade this a a hi e
lea i g algo ith as F a k ‘ose latt s idea of the pe ept o lea i g ule: The pe ept o
algorithm is about learning the weights for the input signals in order to draw linear decision
boundary that allows us to discriminate between the two linearly separable classes +1 and -1.
In a Perceptron, we define the update-weights function in the learning algorithm by the
formula:
wi = wi + delta_wi
where delta_wi = alpha * (T - O) xi, xi is the input associated with the ith input unit, alpha is a
constant between 0 and 1 called the learning rate.
Notes about this update formula:
 Based on a basic idea due to Hebb that the strength of a connection between two units
should be adjusted in proportion to the product of their simultaneous activations. A
product is used as a means of measuring the correlation between the values output by
the two units.
 Also called the Delta Rule or the Widrow-Hoff Rule
 "Local" learning rule in that only local information in the network is needed to update a
weight
 Performs gradient descent in "weight space" in that if there are n weights in the
network, this rule will be used to iteratively adjust all of the weights so that at each
iteration (training example) the error is decreasing (more correctly, the error is
monotonically non-increasing)
 Correct output (T = O) causes no change in a weight
 xi = 0 causes no change in weight

Page no: 10 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

 Does not depend on wi


 If T=1 and O=0, then increase the weight so that hopefully next time the result will
exceed the threshold at the output unit and cause the output O to be 1
 If T=0 and O=1, then decrease the weight so that hopefully next time the result will be
below the threshold and cause the output to be 0.

Linear separability
Perceptron cannot handle tasks which are not separable.
- Definition : Sets of points in 2-D space are linearly separable if the sets can be separated by a
straight line.
- Generalizing, a set of points in n-dimensional space are linearly separable if there is a
hyper plane of (n-1) dimensions separates the sets.
Example: XOR Gate, Figure 1.5:

Fig 1.5: XOR Gate and Linear Separability

Page no: 11 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

UNIT-2 Notes

Introduction of MLP (Multi Layer Perceptron)


1. The multilayer perceptron (MLP) is a hierarchical structure of several perceptrons, and
overcomes the shortcomings of these single-layer networks see figure 1.7.
2. The multilayer perceptron is an artificial neural network that learns nonlinear function
mappings. The multilayer perceptron is capable of learning a rich variety of nonlinear decision
surfaces.
3. Nonlinear functions can be represented by multilayer perceptrons with units that use
nonlinear activation functions. Multiple layers of cascaded linear units still produce only linear
mappings.
4. A neural network with one or more layers of nodes between the input and the output nodes
is called multilayer network.
5. The multilayer network structure, or architecture, or topology, consists of an input layer, two
or more hidden layers, and one output layer. The input nodes pass values to the first hidden
layer, its nodes to the second and so on till producing outputs.
6. A network with a layer of input units, a layer of hidden units and a layer of output units is a
two-layer network. A network with two layers of hidden units is a three-layer network, and so
on. A justification for this is that the layer of input units is used only as an input channel and can
therefore be discounted.
A two-layer neural network that implements the function:
f( x = Σ wjkΣ wijxi + w0j ) + w0k )
where: x is the input vector,
w0j and w0k are the thresholds,
wij are the weights connecting the input with the hidden nodes
wjk are the weights connecting the hidden with the output nodes
s is the sigmoid activation function.
These are the hidden units that enable the multilayer network to learn complex tasks by
extracting progressively more meaningful information from the input examples.
7. The multilayer network MLP has a highly connected topology since every input is connected
to all nodes in the first hidden layer, every unit in the hidden layers is connected to all nodes in
the next layer, and so on.
8. The input signals, initially these are the input examples, propagate through the neural
network in a forward direction on a layer-by-layer basis, that is why they are often called feed
forward multilayer networks.
Two kinds of signals pass through these networks:
- function signals: the input examples propagated through the hidden units and processed by
their activation functions emerge as outputs;
- error signals: the errors at the otuput nodes are propagated backward layer-by-layer through
the network so that each node returns its error back to the nodes in the previous hidden layer.

Page no: 1 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

Fig 1.7: MLP Neural Network

Different activation functions:


An activation function performs a mathematical operation on the signal output. It decides the
criteria and nature of output to be generated. Two most common activation functions are:
1. Threshold Function,
A step function is a function is likely used by the original Perceptron. The output is a certain
value, A1, if the input sum is above a certain threshold and A0 if the input sum is below a
certain threshold. The values used by the Perceptron were A1 = 1 and A0 = 0 see figure 1.8.

Fig 1.8: Threshold Function

2. Sigmoidal (S shaped) function,

Page no: 2 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

A wide variety of sigmoid functions have been used as the activation function of artificial
neurons, including the logistic and hyperbolic tangent functions, see figure 1.9.

Fig 1.9: Sigmoidal Function

Error back propagation algorithm


1. This algorithm was discovered and rediscovered a number of times. This reference also
contains the mathematical details of the derivation of the backpropagation equations, which
we shall omit. (This is covered in COMP9444 Neural Networks.)
2. Like perceptron learning, back-propagation attempts to reduce the errors between the
output of the network and the desired result.
3. However, assigning blame for errors to hidden nodes (i.e. nodes in the intermediate layers),
is not so straightforward. The error of the output nodes must be propagated back through the
hidden nodes.
4. The contribution that a hidden node makes to an output node is related to the strength of
the weight on the link between the two nodes and the level of activation of the hidden node
when the output node was given the wrong level of activation.
5. This can be used to estimate the error value for a hidden node in the penultimate layer, and
that can, in turn, be used in making error estimates for earlier layers see figure 1.9.

Derivation of EBPA

 The basic algorithm can be summed up in the following equation (the delta rule) for the
change to the weight wji from node i to node j:

weight learning local input signal


change rate gradient to node j
Δwji = η × δj × yi

 where the lo al g adie t δj is defined as follows:

Page no: 3 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

1. If node j is a output ode, the δj is the p odu t of φ' vj) and the error signal ej, where
φ _ is the logisti fu tio a d vj is the total input to node j i.e. Σi wjiyi), and ej is the
error signal for node j (i.e. the difference between the desired output and the actual
output);
2. If node j is a hidde ode, the δj is the p odu t of φ' vj) and the weighted sum of the
δ's o puted fo the odes i the e t hidde o output la e that a e o e ted to
node j.
3. [The a tual fo ula is δj = φ' vj) &Sigmak δkwkj where k ranges over those nodes for
which wkj is non-zero (i.e. nodes k that actually have connections from node j. The δk
values have already been computed as they are in the output layer (or a layer closer to
the output layer than node j).]

Fig 1.9: Error Back Propagation Network

-Two Passes of Computation


FORWARD PASS: weights fixed, input signals propagated through network and outputs
calculated. Outputs oj are compared with desired outputs dj; the error signal ej = dj - oj is
computed.
BACKWA‘D PA““: sta ts ith output la e a d e u si el o putes the lo al g adie t δ j for
each node. Then the weights are updated usi g the e uatio a o e fo Δwji, and back to
another forward pass.
-Sigmoidal Nonlinearity
With the sig oidal fu tio φ x defi ed a o e, it is the ase that φ' vj) = yj(1 - yj), a fact that
simplifies the computations.

Momentum

Page no: 4 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

 If the lea i g ate η is very small, then the algorithm proceeds slowly, but accurately
follows the path of steepest descent in weight space.
 If η is la gish, the algo ith a os illate " ou e off the a o alls" .
A simple method of effectively increasing the rate of learning is to modify the delta rule
by including a momentum term:
Δwji(n = α Δwji(n- + η δj(n)yi(n)
he e α is a positi e o sta t te ed the momentum constant. This is called the
generalized delta rule.
 The effect is that if the basic delta rule is consistently pushing a weight in the same
direction, then it gradually gathers "momentum" in that direction.

-Stopping Criterion
Two commonly used stopping criteria are:
 stop after a certain number of runs through all the training data (each run through all
the training data is called an epoch);
 stop when the total sum-squared error reaches some low level. By total sum-squared
e o e ea ΣpΣiei2 where p ranges over all of the training patterns and i ranges over
all of the output units.
-Initialization
 The weights of a network to be trained by backprop must be initialized to some non-
zero values.
 The usual thing to do is to initialize the weights to small random values.
 The reason for this is that sometimes backprop training runs become "lost" on a plateau
in weight-space, or for some other reason backprop cannot find a good minimum error
value.
 Using small random values means different starting points for each training run, so that
subsequent training runs have a good chance of finding a suitable minimum.

Limitation of EBPA
A back-propagation neural network is only practical in certain situations. Following are some
guidelines on when you should use another approach:
 Can you write down a flow chart or a formula that accurately describes the problem? If
so, then stick with a traditional programming method.
 Is there a simple piece of hardware or software that already does what you want? If so,
then the development time for a NN might not be worth it.
 Do you want the functionality to "evolve" in a direction that is not pre-defined? If so,
then consider using a Genetic Algorithm (that's another topic!).
 Do you have an easy way to generate a significant number of input/output examples of
the desired behavior? If not, then you won't be able to train your NN to do anything.
 Is the problem is very "discrete"? Can correct answer be found in a look-up table of
reasonable size? A look-up table is much simpler and more accurate.
 Is precise numeric output values required? NN's are not good at giving precise numeric
answers.

Page no: 5 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

Characteristics of EBPA
There are the characteristics of EBPA:
1. After briefly describing linear threshold units, neural network computation paradigm in
general, and the use of the logistic function (or similar functions) to transform weighted sums
of inputs to a neuron.
2. Backprop's performance on the XOR problem was demonstrated using the tlearn backprop
simulator.
3. A number of refinements to backprop were looked at briefly, including momentum and a
technique to obtain the best generalization ability.
4. Backprop nets learn slowly but compute quickly once they have learned.
5. They can be trained so as to generalize reasonably well.

Applications of EBPA
 Backprop tends to work well in some situations where human experts are unable to
articulate a rule for what they are doing - e.g. in areas depending on raw perception,
and where it is difficult to determine the attributes (in the ID3 sense) that are relevant
to the problem at hand.
 For example, there is a proprietary system, which includes a backprop component, for
assisting in classifying Pap smears.
o The system picks out from the image the most suspicious-looking cells.
o A human expert then inspects these cells.
o This reduces the problem from looking at maybe 10,000 cells to looking at
maybe 100 cells - this reduces the boredom-induced error rate.
 Other successful systems have been built for tasks like reading handwritten postcodes.

 An example of a hybrid network which combine the features of two or more basic network
Counter propagation network architecture

 The hidden layer is a Kohonen network with unsupervised learning and the output layer is a
designs. Proposed by Hecht-Nielsen in 1986.

Grossberg (outstar) layer fully connected to the hidden layer. The output layer is trained by

 Allows the output of a pattern rather than a simple category number.


the Widrow-Hoff rule.

 Can also be viewed as a bidirectional associative memory.


 Figure 2.1 shows a unidirectional counter propagation network used for mapping pattern A

 The output of the A subsection of the input layer is fanned out to the competitive middle
of size n to pattern B of size m.

layer. Each neuron in the output layer recei es a sig al o espo di g to the i put patte s

 The B subsection of the input layer has zero input during actual operation of the network
category along one connection from the middle layer.

and is used to provide input only during training.

Page no: 6 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

 The role of the output layer is to produce the pattern corresponding to the category output
by the middle layer. The output layer uses a supervised learning procedure, with direct
o e tio f o the i put la e s B subsection providing the correct output.
 Training is a two-stage procedure. First, the Kohonen layer is trained on input patterns. No
changes are made in the output layer during this step. Once the middle layer is trained to
correctly categorise all the input patterns, the weights between the input and middle layers
are kept fixed and the output layer is trained to produce correct output patterns by
adjusting weights between the middle and output layers.

Fig 2.1: Counter Propagation Network


CPN Functioning and Training algorithm
Training algorithm stage 1:
1. Apply normalised input vector x to input A.
2. Determine winning node in the Kohonen layer.
3. Update i i g ode s eight e to –
w(t+1) = w(t) + (x – w)
4. Repeat steps 1 through 3 until all vectors have been processed.
5. Repeat steps 1 to 4 until all input vectors have been learned.

Training algorithm stage 2:


1. Apply normalised input vector x and its corresponding output vector y, to inputs A and B
respectively.
2. Determine winning node in the Kohonen layer.
3. Update weights on the connections from the winning node to the output unit –
wi(t+1) = wi(t) + (yi – wi)
4. Repeat steps 1 through 3 until all vectors of all classes map to satisfactory outputs.

Characteristics of CPN
• I the fi st t ai i g phase, if a hidde -layer unit does not win for a long period of time, its
weights should be set to random values to give that unit a chance to win subsequently.
• The e is no need for normalizing the training output vectors.

Page no: 7 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

• Afte the t ai i g has fi ished, the et o k aps the t ai ing vectors onto output vectors
that are close to the desired ones.
• The more hidden units, the better the mapping.
• Tha ks to the o petiti e eu o s i the hidde la e , the li ea eu o s a ealize
nonlinear mappings.

Hopfield/ Recurrent network

The Hopfield network is an implementation of a learning matrix with recurrent links. The
learning matrix is a weight matrix which actually stores associations between inputs and
targets. This network generalizes in the sense that it identifies general dependencies in the
given incomplete and noisy training data, in this sense it resembles a learning matrix. This kind
of a network is a linear model as it can model only linearly separable data.

The Hopfield Type Network is a multiple-loop feedback neural computation system. The
neurons in this network are connected to all other neurons except to themselves that is there
are no self-feedbacks in the network. A connection between two neurons Ni and Nj is two way
which is denoted by wij. The connection wij from the output of neuron i to the input of neuron j
has the same strength as the connection wji from the output of neuron j to the input of neuron
i, in other words the weight matrix is symmetric. Each neuron computes the summation:

si = ∑j=1n wji xj

where: n is the number of neurons in the network, wji are the weights, and xj are inputs,
1<=j<=n.

The Hopfield network can be made to operate in either continuous or discrete mode. Here it is
considered that the network operates in discrete mode using neurons with discrete activation
functions. When a neuron fires, its discrete activation function is evaluated and the following
output is produced: x'i = 1 if si = ∑j=1n wji xj >= 0 or x'i = 0 if si < 0.

Page no: 8 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

Fig 2.2: Hopfield Network

Configuration:
Hopfield networks can be implemented to operate in two modes:
- Synchronous mode of training Hopfield networks means that all neurons fire at the same time.
- Asynchronous mode of training Hopfield networks means that the neurons fire at random.

Stability constraints:
The recurrent networks of the Hopfield type are complex dynamical systems whose behavior is
determined by the connectivity pattern between the neurons. The inputs to the neurons are
not simply externally provided inputs but outputs from other neurons that change with the
time. The temporal behavior of the network implies characteristics that have to be taken into
consideration in order to examine the network performance.
The Hopfield networks are dynamical systems whose state changes with the time. The state of
the neural network is the set of the outputs of all neurons at a particular moment, time instant.
When a neuron fires then its output changes and so the network state also changes. Therefore
the sequence of neuron firings leads to a corresponding sequence of modified neuron outputs,
and modified system states. Acquiring knowledge of the state space allows us to study the
motion of the neural network in time. The trajectories that the network leaves in the time may
be taken to make a state portrait of the system.
A Hopfield net with n neurons has 2n possible states, assuming that each neuron output
produces two values 0 and 1. Performance analysis of the network behavior can be carried out
by developing a state table that lists all possible subsequent states.

Associative Memory:

Page no: 9 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

These kinds of neural networks work on the basis of pattern association, which means they can
store different patterns and at the time of giving an output they can produce one of the stored
patterns by matching them with the given input pattern. These types of memories are also called
Content-Addressable Memory (CAM). Associative memory makes a parallel search with the
stored patterns as data files.
Following are the two types of associative memories we can observe −
 Auto Associative Memory
 Hetero Associative memory

Auto Associative Memory


This is a single layer neural network in which the input training vector and the output target
vectors are the same. The weights are determined so that the network stores a set of patterns.
Architecture
As shown in the following figure, the architecture of Auto Associative memory network has
number of input training vectors and similar number of output target vectors.

Training Algorithm
For training, this network is using the Hebb or Delta learning rule.
Step 1 − I itialize all the eights to ze o as wij = 0 (i = 1 to n, j = 1 to n)
Step 2 − Pe fo steps -4 for each input vector.
Step 3 − A ti ate ea h i put u it as follo s −
xi=si(i=1 to n)
Step 4 − A ti ate ea h output u it as follo s −
yj=sj(j=1 to n)
Step 5 − Adjust the eights as follo s −
wij(new)=wij(old)+xiyj

Fig 2.3: Auto Associative Memory

Page no: 10 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

Hetero Associative memory


Similar to Auto Associative Memory network, this is also a single layer neural network.
However, in this network the input training vector and the output target vectors are not the
same. The weights are determined so that the network stores a set of patterns. Hetero
associative network is static in nature, hence, there would be no non-linear and delay
operations.
As shown in the following figure, the architecture of Hetero Associative Memory network has
number of input training vectors and number of output target vectors.
Training Algorithm
For training, this network is using the Hebb or Delta learning rule.
Step 1 − I itialize all the eights to ze o as wij = 0 (i = 1 to n, j = 1 to m)
Step 2 − Pe fo steps -4 for each input vector.
Step 3 − A ti ate ea h i put u it as follo s −
xi=si(i=1 to n)
Step 4 − A ti ate ea h output u it as follo s −
yj=sj(j=1 to m)
Step 5 − Adjust the eights as follo s −
wij(new)=wij(old)+xiyj

Fig 2.4: Hetero Associative Memory

Characteristics of Associative Memory


Characteristics of an autocorrelation or cross correlation associative memory largely depend on
how items are encoded in pattern vectors to be stored. When most of the components of
encoded patterns to be stored are 0 and only a small ratio of the components is 1, the encoding
scheme is said to be sparse. The memory capacity and information capacity of a sparsely
encoded associative memory are analyzed in detail, and are proved to be in proportion of
n2logn2, n being the number of neurons, which is very large compared to the ordinary non-
sparse encoding scheme of about 0.15n. Moreover, it is proved that the sparsely encoded
associative memory has a large basin of attraction around each memorized pattern, when and
only when an activity control mechanism is attached to it.

Page no: 11 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

Limitations of Associative Memory


Main limitation of associative memory is to efficiency of access and retrieval of pattern stored
in database. In case of damaged pattern, it is unable to restore it.

Applications of Associative Memory


Following are the application areas of associative memory:
1. Patter Recognition i.e. face, signature etc.
2. Content Addressable Storage (CAS)
3. Clustering
4. Encoding and Decoding of Data

Hopfield machine
The standard binary Hopfield network is a recurrently connected network with the following
features:
 symmetrical connections: if there is a connection going from unit j to unit i having a
connection weight equal to W_ij then there is also a connection going from unit i to unit
j with an equal weight.
 linear threshold activation: if the total weighted summed input (dot product of input
and weights) to a unit is greater than or equal to zero, its state is set to 1, otherwise it is
-1. Normally, the threshold is zero. Note that the Hopfield network for the travelling
salesman problem (assignment 3) behaved slightly differently from this.
 asynchronous state updates: units are visited in random order and updated according to
the above linear threshold rule.
 Energy function: it can be shown that the above state dynamics minimizes an energy
function.
 Hebbian learning
The most important features of the Hopfield network are:
 Energy minimization during state updates guarantees that it will converge to a stable
attractor.
 The learning (weight updates) also minimizes energy; therefore, the training patterns
will become stable attractors (provided the capacity has not been exceeded).
However, there are some serious drawbacks to Hopfield networks:
 Capacity is only about .15 N, where N is the number of units.
 Local energy minima may occur, and the network may therefore get stuck in very poor
(high Energy) states which do not satisfy the "constraints" imposed by the weights very
well at all. These local minima are referred to as spurious attractors if they are stable
attractors which are not part of the training set. Often, they are blends of two or more
training patterns.

Boltzmann machine
The binary Boltzmann machine is very similar to the binary Hopfield network, with the addition
of three features:

Page no: 12 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

 Stochastic activation function: the state a unit is in is probabilistically related to its


Energy gap. The bigger the energy gap between its current state and the opposite state,
the more likely the unit will flip states.
 Temperature and simulated annealing: the probability that a unit is on is computed
according to a sigmoid function of its total weighted summed input divided by T. If T is
large, the network behaves very randomly. T is gradually reduced and at each value of T,
all the units' states are updated. Eventually, at the lowest T, units are behaving less
randomly and more like binary threshold units.
 Contrastive Hebbian Learning: A Boltzmann machine is trained in two phases, "clamped"
and "unclamped". It can be trained either in supervised or unsupervised mode. Only the
supervised mode was discussed in class; this type of training proceeds as follows, for
each training pattern:
1. Clamped Phase: The input units' states are clamped to (set and not permitted to
change from) the training pattern, and the output units' states are clamped to
the target vector. All other units' states are initialized randomly, and are then
permitted to update until they reach "equilibrium" (simulated annealing). Then
Hebbian learning is applied.
2. Unclamped Phase: The input units' states are clamped to the training pattern. All
other units' states (both hidden and output) are initialized randomly, and are
then permitted to update until they reach "equilibrium". Then anti-Hebbian
learning (Hebbian learning with a negative sign) is applied.
The above two-phase learning rule must be applied for each training pattern, and for a
great many iterations through the whole training set. Eventually, the output units' states
should become identical in the clamped and unclamped phases, and so the two learning
rules exactly cancel one another. Thus, at the point when the network is always
producing the correct responses, the learning procedure naturally converges and all
weight updates approach zero.
The stochasticity enables the Boltzmann machine to overcome the problem of getting
stuck in local energy minima, while the contrastive Hebb rule allows the network to be
trained with hidden features and thus overcomes the capacity limitations of the
Hopfield network. However, in practice, learning in the Boltzmann machine is hopelessly
slow.

Comparison between Hopfield v/s Boltzman


Hopfield networks have following limitations:
1. suffer from spurious local minima that form on the energy hyper surface
2. require the input patterns to be uncorrelated
3. are limited in capacity of patterns that can be stored
4. are usually fully connected and not stacked
Restricted Boltzmann Machines (RBMs) avoid the spurious solutions that arise in Hopfield
Networks by adding in hidden nodes and then sampling over all possible nodes using Boltzman
statistics:
This is not the only difference however, as older studies of Hopfield networks also looked at
adding temperature (i.e the Little model for spin glasses)

Page no: 13 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

Spin Glass Models of Neural Networks


(* these models have spurious states that form if you are too far below the critical
temperature; if you place too many patterns in a Hopfield network, it undergoes a spin glass
transition.)
It is known, however, that introducing even simple statistical thermal fluctuations leads to 2p
ground stable that are correlated with the input patterns. This behavior is not seen in T=0
Hopfield networks and it appears that introducing the Boltzman stats is critical.
Furthermore, this behavior emerges in 2 very different models, the generalized Hopfield model
and the little model, so it seems it would also appear in the RBM model also.
So adding the hidden variables + temperature has the effect of introducing these kinds of
thermal fluctuations, which create a large number of stable memories without low lying meta
stable states.

Adaptive Resonance Theory: Architecture

ART1 neural networks cluster binary vectors, using unsupervised learning. The neat thing about
adaptive resonance theory is that it gives the user more control over the degree of relative
similarity of patterns placed on the same cluster.

An ART1 net achieves stability when it cannot return any patterns to previous clusters (in other
words, a pattern oscillating among different clusters at different stages of training indicates an
unstable net. Some nets achieve stability by gradually reducing the learning rate as the same
set of training patterns is presented many times. However, this does not allow the net to
readily learn a new pattern that is presented for the first time after a number of training epochs
have already taken place. The ability of a net to respond to (learn) a new pattern equally well
at any stage of learning is called plasticity (e.g., this is a computational corollary of the
biological model of neural plasticity). Adaptive resonance theory nets are designed to be both
stable and plastic.

The basic structure of an ART1 neural network involves:

 an input processing field (called the F1 layer) which happens to consist of two
parts:
o an input portion (F1(a))
o an interface portion (F1(b))
 the cluster units (the F2 layer)
 and a mechanism to control the degree of similarity of patterns placed on the
same cluster
 a reset mechanism
 weighted bottom-up connections between the F1 and F2 layers
 weighted top-down connections between the F2 and F1 layers

Page no: 14 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

F1(b), the interface portion, combines signals from the input portion and the F 2 layer, for use in
comparing the similarity of the input signal to the weight vector for the cluster unit that has
been selected as a candidate for learning.

Fig 2.5: ART Architecture

To control the similarity of patterns placed on the same cluster, there are two sets of
connections (each with its own weights) between each unit in the interface portion of the input
field and the cluster unit. The F1(b) layer is connected to the F2 layer by bottom-up weights
(bij). The F2 layer is connected to the F1(b) layer by top-down weights (tij).
The F2 layer is a competitive layer: The cluster unit with the largest net input becomes the
candidate to learn the input pattern. The activations of all other F2 units are set to zero. The
interface units, F1(b), now combine information from the input and cluster units. Whether or
not this cluster unit is allowed to learn the input pattern depends on how similar its top-down
weight vector is to the input vector. This decision is made by the reset unit, based on signals it
receives from the input F1(a) and interface F1(b) layers. If the cluster unit is not allowed to
learn, it is inhibited and a new cluster unit is selected as the candidate. If a cluster unit is
allowed to learn, it is said to classify a pattern class. Sometimes there is a tie for the winning
neuron in the F2 layer, when this happens, then an arbitrary rule, such as the first of them in a
serial order, can be taken as the winner.
During the operation of an ART1 net, patterns emerge in the F 1a and F1b layers and are called
t a es of “TM sho t−te e o . T a es of LTM lo g−te e o a e i the o e tio
weights between the input layers (F1) and output layer (F2).

Classifications of ART
 ART-1

Page no: 15 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

This is a binary version of ART, i.e., it can cluster binary input vectors.

 ART-2
This is an analogue version of ART, i.e. it can cluster real-valued input vectors.

 ART-2A
This refers to a fast version of the ART2 learning algorithm.

 ART-3
This network is an ART extension that incorporates "chemical transmitters" to control the
search process in a hierarchical ART structure.

 ARTMAP
This is a supervised version of ART that can learn arbitrary mappings of binary patterns.

 Fuzzy ART
This network is a synthesis of ART and fuzzy logic.

 Fuzzy ARTMAP
This is supervised fuzzy ART.

 Distributed ART and ARTMAP (dART and dARTMAP)


These models learn distributed code representations in the F2 layer. In the special case of
winner-take-all F2 layers, they are equivalent to Fuzzy ART and ARTMAP, respectively.

 ART adaptations
We are aware that this list is far from complete. Contributions in the form of a short description
together with references are much appreciated!

 ARTMAP-IC
This network adds distributed prediction and category instance counting to the basic fuzzy
ARTMAP system.

 Gaussian ARTMAP
A supervised-learning ART network that uses Gaussian-defined receptive fields.

 Hierarchical (modular) ART models


These are ART-based modular networks that learn hierarchical clusterings of arbitrary
sequences of input patterns.

 arboART
In this network, the prototype vectors at each layer are used as input to the following layer
(agglomerative method). The architecture is similar to HART-J (see below). It has been applied
to automatic rule generation of Kansei engineering expert systems.

Page no: 16 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

 Cascade Fuzzy ART


A cascade of Fuzzy ART networks that develop hierarchies of analogue and binary patterns
through bottom-up learning guided by a top-down search process. It has been applied to
model-based 3D object recognition.

 HART(-J), HART-S
Modular Hierarchical ART (HART) models. HART-J (also known as HART) implements an
agglomerative clustering method (similar to arboART above). HART-S implements a divisive
clustering method with each ART layer learning the differences between the input and the
matching prototype of the previous layer.

 SMART
Self-consistent Modular ART network, which is capable of learning self-consistent cluster
hierarchies through explicit links and an internal feedback mechanism (much like those of the
ARTMAP network).

 LAPART
An ART-based neural architecture for pattern sequence verification through inferencing.

 MART
Multichannel ART, for adaptive classification of patterns through multiple inputs channels.

 PROBART
A modification to the Fuzzy ARTMAP network (by building up probabilistic information
regarding interlayer node associations) which allows it to learn to approximate noisy mappings.

 R2MAP
An ARTMAP-based architecture capable of learning complex classification tasks by re-iteratively
creating novel (relational) input features to represent the same problem with fewer input
categories. It takes motivation from the representational redescription (RR) hypothesis in
cognitive science.

 TD-ART
Time-Delay ART for learning spatio-temporal patterns.

Implementation and Training


Step 1 − I itialize the lea i g ate, the igila e pa a ete , a d the eights as follo s −

α>1 and 0<ρ 1

0<bij(0)<α / α−1+n) and tij(0)=1

Step 2 − Co ti ue step -9, when the stopping condition is not true.

Page no: 17 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

Step 3 − Co ti ue step -6 for every training input.


Step 4 − “et a ti atio s of all F1(a) and F1 units as follows
F2 = 0 and F1(a) = input vectors
Step 5 − I put sig al f o F1(a) to F1(b) layer must be sent like
si=xi
Step 6 − Fo e e i hi ited F2 node
yj=∑ibijxi, the condition is yj ≠ -1
Step 7 − Pe fo step -10, when the reset is true.
Step 8 − Fi d J for yJ yj for all nodes j
Step 9 − Agai al ulate the a ti atio o F1(b) as follows
xi=sitJi
Step 10 − No , afte al ulati g the o of e to x and vector s, we need to check the reset
o ditio as follo s −
If ||x||/ ||s|| < vigilance parameter ρ, then inhibit node J and go to step 7
Else If ||x||/ ||s|| igila e pa a ete ρ, then proceed further.
Step 11 − Weight updati g fo ode J a e do e as follo s −
bij(new)=(αxi / α−1+||x||)
tij(new)=xi
Step 12 − The stoppi g o ditio fo algo ith ust e he ked a d it a e as follo s −
 Do not have any change in weight.
 Reset is not performed for units.
 Maximum number of epochs reached.

Page no: 18 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

UNIT-3 Notes

Introduction
The word "fuzzy" means "vagueness". Fuzziness occurs when the boundary of a piece of
information is not clear-cut. Fuzzy sets have been introduced by Lotfi A. Zadeh (1965) as an
extension of the classical notion of set.
Human thinking and reasoning frequently involve fuzzy information originating from inherently
inexact human concepts. Human can give satisfactory answers, which are probably true.
However our systems are unable to answer many questions. The reason is, most systems are
designed based upon classical set theory and two-valued logic which is unable to cope with
unreliable and incomplete information and give expert opinions. Fuzzy set theory is an
extension of classical set theory where elements have degree of membership.

Fuzzy Set Theory


Fuzzy set theory is an extension of classical set theory where elements have varying degrees of
membership. A logic based on the two truth values, True and False, is sometimes inadequate
when describing human reasoning. Fuzzy logic uses the whole interval between 0 (false) and
1(true) to describe human reasoning. A fuzzy set in any set that allows its members to have
different degree of membership, called membership function, in the interval [0, 1].

Fuzzy Set and Crisp Set


In classical set theory the characteristic function has only values 0(false) and 1(true). Such sets
are crisp set.
Fo fuzz set the ha a te isti fu tio a e defi es as
 The characteristic function for the crisp set is generalized for the Fuzzy sets.
 This generalized characteristic function is called membership function.
Crisp set theory is not capable of representing descriptions and classification in many cases; In
fact, Crisp set does not provide adequate representation for most cases.

Crisp Relation

A crisp relation is used to represents the presence or absence of interaction, association, or


interconnectedness between the elements of more than a set. This crisp relational concept can
be generalized to allow for various degrees or strengths of relation or interaction between
elements.

Page no: 1 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

Operations on Crisp Relations

Let A and B be two relations defined on X x Y and are represented by relational matrices. The
following operations can be performed on these relations A and B:

Union: A ∪ B (x,y) = max [ A (x,y) , B (x,y) ]

Intersection: A ∩ B , = i [A , ,B , ]

Fuzzy relation

Degrees of association can be represented by grades of the membership in a fuzzy relation in


the same way as degrees of set membership are represented in the fuzzy set. In fact, just as the
crisp set can be viewed as a restricted case of the more general fuzzy set concept, the crisp
relation can be considered to be a restricted case of the fuzzy relations.

Operations on Fuzzy Relations


Ha i g t o fuzz sets A˜ a d B˜, the u i e se of i fo atio U a d a ele e t � of the
universe, the following relations express the union, intersection and complement operation on
fuzzy sets.

. U io /Fuzzy O‘
Let us consider the following ep ese tatio to u de sta d ho the U io /Fuzz O‘ elatio
o ks −
μA˜∪B˜ =μA˜ μB˜∀y∈U
Here ep ese ts the a ope atio .

. I tersectio / Fuzzy AND


Let us o side the follo i g ep ese tatio to u de sta d ho the I te se tio /Fuzz AND
elatio o ks −

Page no: 2 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

μA˜∩B˜ =μA˜ μB˜∀y∈U


Here ep ese ts the i ope atio .

. Co ple e t/ Fuzzy NOT


Let us o side the follo i g ep ese tatio to u de sta d ho the Co ple e t/Fuzz NOT
elatio o ks −
μA˜= −μA˜(y)y∈U

Fuzzy systems: Crisp logic

Crisp logic (crisp) is the same as Boolean logic (either 0 or 1). Either a statement is true (1) or it
is not (0), meanwhile fuzzy logic captures the degree to which something is true. Consider the
state e t: The ag eed to et at o lo k ut Be as ot pu tual.

 Crisp logic: If Ben showed up precisely at 12, he is punctual; otherwise he is too early or
too late.
 Fuzzy logic: The degree, to which Ben was punctual, can be identified by on how much
earlier or later he showed up (e.g. 0, if he showed up 11:45 or 12:15, 1 at 12:00 and a
linear increase / decrease in between).

Crisp is multiple times in the closely related Fuzzy Set Theory FS, where it has been used to
disti guish Ca to s set theo f o )adeh s set theo .

Page no: 3 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

Fuzzy Logic
Fuzzy logic is derived from fuzzy set theory dealing with reasoning that is approximate rather
than precisely deduced from classical predicate logic. Fuzzy logic is capable of handling
inherently imprecise concepts. Fuzzy logic allows in linguistic form the set membership values
to i p e ise o epts like slightl , uite a d e .

Membership Function

Definition: a membership function for a fuzzy set A on the universe of discourse X is defined as
µA: X → [ , ], he e ea h ele e t of X is apped to a alue etween 0 and 1. This value,
called membership value or degree of membership, quantifies the grade of membership of the
element in X to the fuzzy set A.

Membership functions allow us to graphically represent a fuzzy set. The x axis represents the
universe of discourse, whereas the y axis represents the degrees of membership in the [0, 1]
interval.

Simple functions are used to build membership functions. Because we are defining fuzzy
concepts, using more complex functions does not add more precision.

Some of the Fuzzy member functions are as follows:

1. Triangular MFs

A triangular MF is specified by three parameters {a, b, c} as follows:

The parameters {a, b, c} (with a < b < c) determine the x coordinates of the three corners of the
underlying triangular MF.
2. Trapezoidal MFs

A trapezoidal MF is specified by four parameters {a, b, c, d} as follows:

Page no: 4 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

The parameters {a, b, c, d} (with a < b <= c < d) determine the x coordinates of the four corners
of the underlying trapezoidal MF.

3. Gaussian MFs

A Gaussian MF is specified by tow parameters:

A Gaussia MF is dete i ed o plete a d σ; ep ese ts the MFs e t e a d


σ determines the MFs width.

Fuzzy rule base system:

Fuzzy rules are linguistic IF-THEN- constructions that have the general form "IF A THEN B"
where A and B are (collections of) propositions containing linguistic variables. A is called the
premise and B is the consequence of the rule. In effect, the use of linguistic variables and fuzzy
IF-THEN- rules exploits the tolerance for imprecision and uncertainty. In this respect, fuzzy logic
mimics the crucial ability of the human mind to summarize data and focus on decision-relevant
information.

In a more explicit form, if there are I rules each with K premises in a system, the ith rule has the
following form.

In the above equation a represents the crisp inputs to the rule and A and B are linguistic
variables. The operator 1 can be AND or OR or XOR.

Example: If a HIGH flood is expected and the reservoir level is MEDIUM, then water release is
HIGH.

Page no: 5 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

Several rules constitute a fuzzy rule-based system.

Another example comes from Kosko (1993). Figures 4.1 below illustrates the notion of a simple
fuzzy rule with one input and one output applied to the problem of an air motor speed
controller for air conditioning. Rules are given. Let us say the temperature is 22 degrees. This
temperature is "right" to a degree of 0.6 and "warm" to a degree of 0.2 and it belongs to all
others to a degree of zero. This activates two of the rules shown in Figure 4.2. The rule
responses are combined to give those shown in Figure 4.3 (thick lines).

Fig: 4.1 Fuzzy Rule Base Example

Note 1. Air motor speed controller, Temperature (input) and speed (output) are fuzzy variables
used in the set of rules.
Note 2. Temperature of 22 deg. "fires" two fuzzy rules. The resulting fuzzy value for air motor
speed is "defuzzified .

Page no: 6 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

Fuzzy Proposition
Fuzz p opositio s a e assig ed to fuzz sets. “uppose a fuzz p opositio P is assig ed to a
fuzz set A , the the t uth alue of the p opositio is p oposed T P = μA he e
μA
Therefore truthness of a proposition P is membership value of x in fuzzy set A.
The logical connectives like disjunction, conjunction, negation and implication are also defined
on fuzzy propositions.

Fuzzy Proposition Formation


Let, a fuzz p opositio P is defi ed o a fuzz set A a d Q is defi ed o fuzz set B the :
1. Conjunction

P / Q : x is A and B

T( P / Q) = Min [ T(P), T(Q)]

2. Negation

T(Pc) = 1 – T(P)

3. Disjunction

P V Q : x in A or B

T (P V Q) = Max [ T(P), T(Q) ]

4. Implication

P → Q : is A the is B

T P → Q = T Pc V Q) = Max [ T(Pc, T(Q)]

If P is a proposition defined on set A on universe of discourse X and Q is another proposition


defi ed o set B o u i e se of dis ou se Y, the the i pli atio P → Q a e ep ese ted
the relation R

R = (A X B) U (Ac X Y) = If A then B and If x ∈ A, where x ∈ X and A ⊂ X then y ∈ B, where y ∈ Y


and B ⊂ Y

Fuzzy Decomposition

Page no: 7 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

Fuzzy Decomposition can be done using IF-THEN rules. It is a known fact that a human being is
always comfortable making conversations in natural language. The representation of human
k o ledge a e do e ith the help of follo i g atu al la guage e p essio −
IF antecedent THEN consequent
The expression as stated above is referred to as the Fuzzy IF-THEN rule base.

General format: if x is A then y is B

 If pressure is high, then volume is small


Examples:

 If the road is slippery, then driving is dangerous


 Is a tomato is red, then it is ripe
 If the speed is high, then apply the break a little.

Interpretations of Fuzzy IF-THEN Rules


Fuzzy IF-THEN ‘ules a e i te p eted i the follo i g fou fo s−

Assignment Statements
These ki ds of state e ts use = e ual to sig fo the purpose of assignment. They are of the
follo i g fo −
a = hello
climate = summer

Conditional Statements
These ki ds of state e ts use the IF-THEN ule ase fo fo the pu pose of o ditio . The
a e of the follo i g fo −
IF temperature is high THEN Climate is hot
IF food is fresh THEN eat.

Unconditional Statements
The a e of the follo i g fo −
GOTO 10
turn the Fan off

Fuzzy Aggregation:

Having generated the truth of each rule in the rule base, we often have to aggregate them to
evaluate the truth of some derived statement. When aggregating a large number of fuzzy rules
(``clauses'') into a single fuzzy value, we can use the min, the max or a generalized mean
operator. In the case of rare events, the attributes are distributed in such a way that, except at
the extremes, the non-events always out number events. Optimal performance would,
therefore, be achieved when the fuzzy membership functions are non-zero only in the tail-end
of the distributions.
However, in the presence of one or more zeros in the clauses, we found that the minimum and
maximum operators failed because of their dependence on all or none of their arguments

Page no: 8 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

holding while the generalized means got diluted. Consequently, we use an aggregation
operator defined as:

It can be readily verified that this operator satisfies the requirements of an aggregation operator. It
satisfies the boundary conditions and is monotonic non-decreasing in all its arguments. In addition, since
we sort the arguments in descending order before doing the weighted addition, it is symmetric in its
arguments.

Fuzzy Reasoning:
Follo i g a e the diffe e t odes of app o i ate easo i g −
Categorical Reasoning
In this mode of approximate reasoning, the antecedents, containing no fuzzy quantifiers and
fuzzy probabilities, are assumed to be in canonical form.

Qualitative Reasoning
In this mode of approximate reasoning, the antecedents and consequents have fuzzy linguistic
variables; the input-output relationship of a system is expressed as a collection of fuzzy IF-THEN
rules. This reasoning is mainly used in control system analysis.

Syllogistic Reasoning
In this mode of approximation reasoning, antecedents with fuzzy quantifiers are related to
i fe e e ules. This is e p essed as −
= “ A′s a e B′s
y = “ C′s a e D′s
------------------------
z = “ E′s a e F′s

 S1 and S2 are given fuzzy quantifiers.


Here A,B,C,D,E,F are fuzzy predicates.

 S3 is the fuzzy quantifier which has to be decided.

Dispositional Reasoning
In this mode of approximation reasoning, the antecedents are dispositions that may contain the
fuzz ua tifie usuall . The ua tifie Usuall li ks togethe the dispositio al a d s llogisti
reasoning; hence it pays an important role.

For example, the projection rule of inference in dispositional reasoning can be given as follows

usually ( (L,M) is R ) ⇒ usuall L is [‘ ↓ L]
He e [‘ ↓ L] is the p oje tio of fuzz elatio ‘ o L

Page no: 9 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

Fuzzy Inference System


Fuzzy Inference System is the key unit of a fuzzy logic system having decision making as its
pi a o k. It uses the IF…THEN ules alo g ith o e to s O‘ o AND fo d a i g
essential decision rules.

 The output from FIS is always a fuzzy set irrespective of its input which can be fuzzy or
Characteristics of Fuzzy Inference System

 It is necessary to have fuzzy output when it is used as a controller.


crisp.

 A defuzzification unit would be there with FIS to convert fuzzy variables into crisp
variables.

Knowledge Base

Data Base Rule Base

Input (Crisp) Fuzzification Defuzzification Output (Crisp)


Interface Unit Interface Unit

Decision Making Unit

Fig 4.2: Block Diagram of FIS


Functional Blocks of FIS
The follo i g fi e fu tio al lo ks ill help ou u de sta d the o st u tio of FI“ −
Rule Base − It o tai s fuzz IF-THEN rules.
Database − It defi es the e e ship fu tions of fuzzy sets used in fuzzy rules.
Decision-making Unit − It pe fo s ope atio o ules.
Fuzzification Interface Unit − It o e ts the isp ua tities i to fuzz ua tities.
Defuzzification Interface Unit − It o e ts the fuzz ua tities i to isp ua tities. Follo i g is
a block diagram of fuzzy interference system.

Working of FIS
The o ki g of the FI“ o sists of the follo i g steps −

Page no: 10 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

 A fuzzification unit supports the application of numerous fuzzification methods, and


converts the crisp input into fuzzy input.
A knowledge base - collection of rule base and database is formed upon the conversion


of crisp input into fuzzy input.
The defuzzification unit fuzzy input is finally converted into crisp output.

Fuzzy Decision Making


It is an activity which includes the steps to be taken for choosing a suitable alternative from
those that are needed for realizing a certain goal.

Steps for Decision Making

Let us now discuss the steps involved in the decisio aki g p o ess −

 Determining the Set of Alternatives − I this step, the alte ati es f o hi h the
decision has to be taken must be determined.
 Evaluating Alternative − He e, the alte ati es ust e e aluated so that the de isio
can be taken about one of the alternatives.
 Comparison between Alternatives − I this step, a o pa iso et ee the e aluated
alternatives is done.

Types of Decision
1. Individual Decision Making

In this type of decision making, only a single person is responsible for taking decisions. The
de isio aki g odel i this ki d a e ha a te ized as −

 Set of possible actions


 Set of goals Gi(i∈Xn);
 Set of Constraints Cj(j∈Xm)

The goals and constraints stated above are expressed in terms of fuzzy sets.

Now consider a set A. Then, the goal and constraints for this set are given by –

Gi(a)= composition[Gi(a)] = G1i(Gi(a)) with G1i

Cj(a)= composition[Cj(a)] = C1j(Cj(a)) with C1j for a∈A


The fuzz de isio i the a o e ase is gi e −

FD=min[i∈XinnfGi(a),j∈XinmfCj(a)]

Page no: 11 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

2. Multi-person Decision Making

Decision making in this case includes several persons so that the expert knowledge from various
persons is utilized to make decisions.

Cal ulatio fo this a e gi e as follo s −

Number of persons preferring xi to xj = N(xi,xj)

Total number of decision makers = n Then, SC(xi,xj)=N(xi,xj)n

3. Multi-objective Decision Making

Multi-objective decision making occurs when there are several objectives to be realized. There
are following two issues in this type of de isio aki g −

 To acquire proper information related to the satisfaction of the objectives by various


alternatives.
 To weigh the relative importance of each objective.

Mathe ati all e a defi e a u i e se of alte ati es as −

A=[a1,a2,...,ai,...,an]

A d the set of o je ti es as O=[o1,o2,...,oi,...,on]

4. Multi-attribute Decision Making

Multi-attribute decision making takes place when the evaluation of alternatives can be carried
out based on several attributes of the object. The attributes can be numerical data, linguistic
data and qualitative data.

Mathematically, the multi-attribute evaluation is carried out on the basis of linear equation as
follo s −

Y=A1X1+A2X2+...+AiXi+...+ArXr

Applications of Fuzzy Logic

Page no: 12 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

1. Aerospace

In aerospace, fuzz logi is used i the follo i g a eas −

 Altitude control of spacecraft


 Satellite altitude control
 Flow and mixture regulation in aircraft deicing vehicles
2. Automotive

I auto oti e, fuzz logi is used i the follo i g a eas −

 Trainable fuzzy systems for idle speed control


 Shift scheduling method for automatic transmission
 Intelligent highway systems
 Traffic control
 Improving efficiency of automatic transmissions
3. Business

I usi ess, fuzz logi is used i the follo i g a eas −

 Decision-making support systems


 Personnel evaluation in a large company
4. Defense

I defe se, fuzz logi is used i the follo i g a eas −

 Underwater target recognition


 Automatic target recognition of thermal infrared images
 Naval decision support aids
 Control of a hypervelocity interceptor
 Fuzzy set modeling of NATO decision making
5. Electronics

I ele t o i s, fuzz logi is used i the follo i g a eas −

 Control of automatic exposure in video cameras


 Humidity in a clean room
 Air conditioning systems
 Washing machine timing
 Microwave ovens
 Vacuum cleaners
6. Finance

I the fi a e field, fuzz logi is used i the follo i g a eas −

Page no: 13 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

 Banknote transfer control


 Fund management
 Stock market predictions
7. Industrial Sector

In industrial, fuzzy logic is used in following areas −

 Cement kiln controls heat exchanger control


 Activated sludge wastewater treatment process control
 Water purification plant control
 Quantitative pattern analysis for industrial quality assurance
 Control of constraint satisfaction problems in structural design
 Control of water purification plants
8. Manufacturing

I the a ufa tu i g i dust , fuzz logi is used i follo i g a eas −

 Optimization of cheese production


 Optimization of milk production
9. Marine

In the marine field, fuzzy logic is used i the follo i g a eas −

 Autopilot for ships


 Optimal route selection
 Control of autonomous underwater vehicles
 Ship steering
10. Medical

I the edi al field, fuzz logi is used i the follo i g a eas −

 Medical diagnostic support system


 Control of arterial pressure during anesthesia
 Multivariable control of anesthesia
 Modeling of neuropath logical findings in Alzheimer's patients
 Radiology diagnoses
 Fuzzy inference diagnosis of diabetes and prostate cancer
11. Securities

In securities, fuzzy logic is used i follo i g a eas −

 Decision systems for securities trading


 Various security appliances

Page no: 14 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

12. Transportation

I t a spo tatio , fuzz logi is used i the follo i g a eas −

 Automatic underground train operation


 Train schedule control
 Railway acceleration
 Braking and stopping
13. Pattern Recognition and Classification

I Patte ‘e og itio a d Classifi atio , fuzz logi is used i the follo i g a eas −

 Fuzzy logic based speech recognition


 Fuzzy logic based
 Handwriting recognition
 Fuzzy logic based facial characteristic analysis
 Command analysis
 Fuzzy image search
14. Psychology

I Ps holog , fuzz logi is used i follo i g a eas −

 Fuzzy logic based analysis of human behavior


 Criminal investigation and prevention based on fuzzy logic reasoning

Page no: 15 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

UNIT-4 Notes

Genetic Algorithm: Fundamental

Genetic Algorithm (GA) is a search-based optimization technique based on the principles of


Genetics and Natural Selection. It is frequently used to find optimal or near-optimal solutions to
difficult problems which otherwise would take a lifetime to solve. It is frequently used to solve
optimization problems, in research, and in machine learning.

What are Genetic Algorithms?


Nature has always been a great source of inspiration to all mankind. Genetic Algorithms (GAs)
are search based algorithms based on the concepts of natural selection and genetics. GAs are a
subset of a much larger branch of computation known as Evolutionary Computation.

In GAs, we have a pool or a population of possible solutions to the given problem. These
solutions then undergo recombination and mutation (like in natural genetics), producing new
children, and the process is repeated over various generations. Each individual (or candidate
solution) is assigned a fitness value (based on its objective function value) and the fitter
individuals are given a higher chance to mate and yield more fitte i di iduals. This is i li e
ith the Da i ia Theo of “u i al of the Fittest .

Genetic Algorithms are sufficiently randomized in nature, but they perform much better than
random local search (in which we just try various random solutions, keeping track of the best so
far), as they exploit historical information as well.

Basic Concepts
Individual: An individual is a single solution. An individual groups together two forms of
solutions as given below:
1. The chromosome which is the raw ge eti i fo atio ge ot pe that the GA deals.
2. The phenotype which is the expressive of the chromosome in the terms of the model.
A chromosome is subdivided into genes.
Genes: Ge es a e the asi i st u tio s fo uildi g a GA. A h o oso e is a se ue e of
genes. Genes may describe a possible solution to a problem, without actually being the
solution. A gene is a bit string of arbitrary lengths. The bit string is a binary representation of
u e of i te als f o a lo e ou d. A ge e is the GA s ep ese tatio of a si gle fa to
value for a control factor, where control factor must have an upper bound and a lower bound.
Fitness: The fitness of an individual in a GA is the value of an objective function for its
phenotype. For calculating fitness, the chromosome has to be first decoded and the objective
function has to be evaluated. The fitness not only indicates how good the solution is, but also
corresponds to how close the chromosome is to the optimal one.

Page no: 1 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

In case of multi criterion optimization, the fitness function is definitely more difficult to
determine. In multi criterion optimization problems, there is often a dilemma as how to
determine if one solution is better than another.
Population: A population is a collection of individuals. A population consist of a number of
individuals being tested, the phenotype parameters defining the individuals and some
information about the search space. The two important aspects of population used in GA are:
1. The initial population generation
2. The population size.
For each and every problem, the population size will depend on the complexity of the problem.
It is often a random initialization of population.

Working Principle
Working principle of a GA is as follows −
We start with an initial population (which may be generated at random or seeded by other
heuristics), select parents from this population for mating. Apply crossover and mutation
operators on the parents to generate new off-springs. And finally these off-springs replace the
existing individuals in the population and the process repeats, see figure 1.1. In this way genetic
algorithms actually try to mimic the human evolution to some extent.
A generalized pseudo-code for a GA is explained in the following program –

Genetic Algorithm()
initialize population
find fitness of population
while (termination criteria is reached) do
parent selection
crossover with probability pc
mutation with probability pm
decode and fitness calculation
survivor selection
find best
return best

Page no: 2 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

Fig 1.1: Flow Chart of GA

Encoding: Encoding is a process of representing individual genes. The process can be performed
using bits, numbers, trees, arrays, lists or any other objects. The encoding depends mainly on
solving the problems.

Types of encoding
One of the most important decisions to make while implementing a genetic algorithm is
deciding the representation that we will use to represent our solutions. It has been observed
that improper representation can lead to poor performance of the GA.
Therefore, choosing a proper representation, having a proper definition of the mappings
between the phenotype and genotype spaces is essential for the success of a GA.
In this section, we present some of the most commonly used representations for genetic
algorithms. However, representation is highly problem specific and the reader might find that
another representation or a mix of the representations mentioned here might suit his/her
problem better.
1. Binary Encoding
This is one of the simplest and most widely used representations in GAs. In this type of
representation the genotype consists of bit strings.
For some problems when the solution space consists of Boolean decision variables – yes or no,
the binary representation is natural. Take for example the 0/1 Knapsack Problem. If there are n

Page no: 3 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

items, we can represent a solution by a binary string of n elements, where the x th element tells
whether the item x is picked (1) or not (0).
2. Real Value Encoding
For problems where we want to define the genes using continuous rather than discrete
variables, the real valued representation is the most natural. The precision of these real valued
or floating point numbers is however limited to the computer.
3. Integer Encoding
Fo dis ete alued ge es, e a ot al a s li it the solutio spa e to i a es o o . Fo
example, if we want to encode the four distances – North, South, East and West, we can
encode them as {0,1,2,3}. In such cases, integer representation is desirable.
4. Permutation Encoding
In many problems, the solution is represented by an order of elements. In such cases
permutation representation is the most suited.
A classic example of this representation is the travelling salesman problem (TSP). In this the
salesman has to take a tour of all the cities, visiting each city exactly once and come back to the
starting city. The total distance of the tour has to be minimized. The solution to this TSP is
naturally an ordering or permutation of all the cities and therefore using a permutation
representation makes sense for this problem.

Fitness Function
The fitness function simply defined is a function which takes a candidate solution to the
problem as input and produces as output ho fit ou ho good the solution is with respect
to the problem in consideration.
Calculation of fitness value is done repeatedly in a GA and therefore it should be sufficiently
fast. A slow computation of the fitness value can adversely affect a GA and make it
exceptionally slow.
In most cases the fitness function and the objective function are the same as the objective is to
either maximize or minimize the given objective function. However, for more complex
problems with multiple objectives and constraints, an Algorithm Designer might choose to have
a different fitness function.
A fit ess fu tio should possess the follo i g ha a te isti s −
 The fitness function should be sufficiently fast to compute.
 It must quantitatively measure how fit a given solution is or how fit individuals can be
produced from the given solution.
In some cases, calculating the fitness function directly might not be possible due to the inherent
complexities of the problem at hand. In such cases, we do fitness approximation to suit our
needs.
The following image shows the fitness calculation for a solution of the 0/1 Knapsack. It is a
simple fitness function which just sums the profit values of the items being picked (which have
a 1), scanning the elements from left to right till the knapsack is full.

Page no: 4 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

Fig 1.2: Fitness Function

Reproduction / Selection
Fitness Proportionate Selection is one of the most popular ways of parent selection. In this
every individual can become a parent with a probability which is proportional to its fitness.
Therefore, fitter individuals have a higher chance of mating and propagating their features to
the next generation. Therefore, such a selection strategy applies a selection pressure to the
more fit individuals in the population, evolving better individuals over time.
Following are some of the selection strategies:

Roulette Wheel Selection


In a roulette wheel selection, the circular wheel is divided as described before. A fixed point is
chosen on the wheel circumference as shown and the wheel is rotated. The region of the wheel
which comes in front of the fixed point is chosen as the parent. For the second parent, the
same process is repeated.

Page no: 5 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

Fig 1.3: Roulette Wheel Selection


I ple e tatio ise, e use the follo i g steps −
 Calculate S = the sum of a fitnesses.
 Generate a random number between 0 and S.
 Starting from the top of the population, keep adding the finesses to the partial sum P,
till P<S.
 The individual for which P exceeds S is the chosen individual.

Tournament Selection
In K-Way tournament selection, we select K individuals from the population at random and
select the best out of these to become a parent. The same process is repeated for selecting the
next parent. Tournament Selection is also extremely popular in literature as it can even work
with negative fitness values.

Fig 1.4: Tournament selection

Rank Selection
Rank Selection also works with negative fitness values and is mostly used when the individuals
in the population have very close fitness values (this happens usually at the end of the run). This

Page no: 6 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

leads to each individual having an almost equal share of the pie (like in case of fitness
proportionate selection) as shown in the following image and hence each individual no matter
how fit relative to each other has an approximately same probability of getting selected as a
parent. This in turn leads to a loss in the selection pressure towards fitter individuals, making
the GA to make poor parent selections in such situations.
In this, we remove the concept of a fitness value while selecting a parent. However, every
individual in the population is ranked according to their fitness. The selection of the parents
depends on the rank of each individual and not the fitness. The higher ranked individuals are
preferred more than the lower ranked ones.
Chromosome Fitness Value Rank
A 8.1 1
B 8.0 4
C 8.05 2
D 7.95 6
E 8.02 3
F 7.99 5

Genetic Modeling: Inheritance Operator


In genetic algorithms, inheritance is the ability of modeled objects to mate, mutate (similar to
biological mutation), and propagate their problem solving genes to the next generation, in
order to produce an evolved solution to a particular problem. The selection of objects that will
be inherited from in each successive generation is determined by a fitness function, which
varies depending upon the problem being addressed.
The traits of these objects are passed on through chromosomes by a means similar to biological
reproduction. These chromosomes are generally represented by a series of genes, which in turn
are usually represented using binary numbers. This propagation of traits between generations
is similar to the inheritance of traits between generations of biological organisms. This process
can also be viewed as a form of reinforcement learning, because the evolution of the objects is
driven by the passing of traits from successful objects which can be viewed as a reward for their
success, thereby promoting beneficial traits.

Crossover Operator
Crossover is the process of taking two parent solutions and producing from them a child. After
the selection (reproduction) process, the population is enriched with better individuals.
Reproduction makes clones of good strings but does not create new ones. Crossover operator is
applied to the mating pool with the hope that is creates a better offspring.
Crossover is a recombination operator that proceeds in three steps:
a. The reproduction operator selects at random a pair of two individual strings for the
mating.
b. A cross site is selected at random along the string length.
c. Finally, the position values are swapped between the two strings following the cross
site.

Page no: 7 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

Some of cross over techniques are as follows:

One Point Crossover


In this one-point crossover, a random crossover point is selected and the tails of its two parents
are swapped to get new off-springs.

Multi Point Crossover


Multi point crossover is a generalization of the one-point crossover wherein alternating
segments are swapped to get new off-springs.

Uniform Crossover
I a u ifo osso e , e do t di ide the h o oso e i to seg e ts, ather we treat each
gene separately. In this, we essentially flip a coin for each chromosome to decide whether or
ot it ll e i luded i the off-spring. We can also bias the coin to one parent, to have more
genetic material in the child from that parent.

Inversion and Deletion Operator


In Inversion operator inverts the bits between two random sites.
Example: Chromosome 0100111 can be inverted to 0111001.
In Deletion two or three bits in random are selected and then deleted from chromosome.
Example: Chromosome 0100111 after deletion would be to 01001.

Mutation Operator
After a crossover is performed, mutation takes place. This is to prevent falling all solutions in
population into a local optimum of solved problem. Mutation changes randomly the new

Page no: 8 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

offspring. For binary encoding we can switch a few randomly chosen bits from 1 to 0 or from 0
to 1. Mutation can then be following:
Original offspring 1 1101111000011110
Original offspring 2 1101100100110110
Mutated offspring 1 1100111000011110
Mutated offspring 2 1101101100110110

The mutation depends on the encoding as well as the crossover. For example when we are
encoding permutations, mutation could be exchanging two genes.

Bitwise operator
In this bit wise operator, we select one or more random bits and flip them. This is used for binary
encoded GAs.

Generation Cycle:
A Genetic Algorithms operates through a simple cycle of stages:
i C eatio of a ; populatio of st i gs,
ii) Evaluation of each string,
iii) Selection of best strings and
iv) Genetic manipulation to create new population of strings.

The cycle of a Genetic Algorithms is presented below:

Page no: 9 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

Fig 1.5 Generation Cycle

Each cycle in Genetic Algorithms produces a new generation of possible solutions for a
given problem. In the first phase, an initial population, describing
representatives of the potential solution, is created to initiate the search
process. The elements of the population are encoded into bit-strings, called chromosomes. The
performance of the strings, often called fitness, is then evaluated with the help of some
functions, representing the constraints of the problem. Depending on the fitness of the
chromosomes, they are selected for a subsequent genetic manipulation process.
It should be notes that the selection process is mainly responsible for assuring survival of the
best-fit individuals. After selection of the population strings is over, the genetic manipulation
process consisting of two steps is carried out. In the first step, the crossover
operation that recombines the bits (genes) of each two selected strings
(chromosomes) is executed.

Convergence of GA
A genetic algorithm is usually said to converge when there is no significant improvement in the
values of fitness of the population from one generation to the next. Examples of stopping
criteria are generally, time limits placed on the GA run, generation limits, or if the algorithms
finds a suitably low fitness individual, lower than a specified fitness threshold ( in case we are
minimizing fitness ). There is no defined difference between stopping criteria and convergence
criteria; the terms can be used interchangeably. Verifying that a GA has converged at global
optima for a NP hard problem is impossible, unless you have a test data set for which the best

Page no: 10 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

solution is already known. The best you can do, is try out multiple runs of the GA, with different
values of Mutation, Crossover probabilities, try out different fitness functions, crossover
operators, and many variants of simple GA such as Elitism, Multi-Objective GA's etc.
The common GA terminating conditions are:
 When fixed number of generations are reached
 A optimal solution is obtained that satisfies the optimization criteria
 When successive GA iterations no longer produce better results
 Allocated budget (computational time / cost) reached

Application of GA
Genetic Algorithms are primarily used in optimization problems of various kinds, but they are
frequently used in other application areas as well.
Following are some of the areas in which Genetic Algorithms are frequently used:
 Optimization − Ge eti Algo ith s a e ost o o l used i opti izatio p o le s
wherein we have to maximize or minimize a given objective function value under a
given set of constraints. The approach to solve Optimization problems has been
highlighted throughout the tutorial.
 Economics − GAs a e also used to ha a te ize a ious e o o i odels like the
cobweb model, game theory equilibrium resolution, asset pricing, etc.
 Neural Networks − GAs a e also used to t ai eu al et o ks, pa ti ula l e u e t
neural networks.
 Parallelization − GAs also ha e e good pa allel apa ilities, a d p o e to e e
effective means in solving certain problems, and also provide a good area for research.
 Image Processing − GAs a e used fo a ious digital i age p o essi g DIP tasks as ell
like dense pixel matching.
 Vehicle routing problems − With ultiple soft ti e i do s, ultiple depots a d a
heterogeneous fleet.
 Scheduling applications − GAs a e used to solve various scheduling problems as well,
particularly the time tabling problem.
 Machine Learning − as al ead dis ussed, ge eti s ased a hi e lea i g GBML is a
niche area in machine learning.
 Robot Trajectory Generation − GAs ha e ee used to plan the path which a robot arm
takes by moving from one point to another.
 Parametric Design of Aircraft − GAs ha e ee used to desig ai afts a i g the
parameters and evolving better solutions.
 DNA Analysis − GAs ha e ee used to dete i e the structure of DNA using
spectrometric data about the sample.
 Multimodal Optimization − GAs a e o iousl e good app oa hes fo ulti odal
optimization in which we have to find multiple optimum solutions.
 Traveling salesman problem and its applications − GAs ha e ee used to sol e the
TSP, which is a well-known combinatorial problem using novel crossover and packing
strategies
Advances in GA: Implementation of GA using MATLAB

Page no: 11 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

MATLAB has a wide variety of functions useful to the genetic algorithm practitioner.Given the
e satilit of MATLAB s high-level language, problems can be codedin m-files in a fraction of the
time that it would take to create C or FORTRAN programs for the same purpose. Couple this
ith MATLAB s ad a ed data a al sis, isualizatio tools and special purpose application
domain toolboxes and the user ispresented with a uniform environment with which to explore
the potential of geneticalgorithms.
The Genetic Algorithm Toolbox uses MATLAB matrix functions to build a setof versatile tools for
implementing a wide range of genetic algorithm methods. TheGenetic Algorithm Toolbox is a
collection of routines, written mostly in m-files,which implement the most important functions
in genetic algorithms.
The main data structures used by MATLAB in the Genetic Algorithmtoolbox are:
• Ch o oso es
• O je ti e fu tio alues
• Fit ess alues

Differences & similarities between GA & other traditional methods


GA is radically different from traditional optimization methods. Following are some of the
differences of GA with other optimization techniques:
1. GA works with string coding of variables instead of variables. So that coding
differentiate/categorize the search space even though the function is continuous.
2. GA works with population of points instead of single point.
3. In GA's previously found good information is emphasized using reproduction operator and
propagated adaptively through crossover and mutation operators.
4. GA does not require any auxiliary information except the objective function values.
5. GA uses the probabilities in their operators. It narrows the search space as the search
progresses is adaptive and is the unique characteristic of Genetic Algorithms.

Page no: 12 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

UNIT-5 Notes

Advanced soft computing techniques: Rough Set Theory-Introduction:


1. Set of elements which lie between lower and upper approximations of a crisp set according
to rough set theory by Pawlak. Learn more in: Cancer Gene Expression Data Analysis Using
Rough Based Symmetrical Clustering
2. A set in which the uncertainty is captured in the boundary region. It is approximated by a pair
of crisp sets called the lower and upper approximation of the set. Learn more in: An
Uncertainty-Based Model for Optimized Multi-Label Classification
3. The concept of rough, or approximation, sets was introduced by Pawlak and is based on the
single assumption that information is associated with every object in an information system.
This information is expressed through attributes that describe the objects; objects that cannot
be distinguished on the basis of a selected attribute are referred to as indiscernible. A rough set
is defined by two sets, the lower approximation and the upper approximation. Learn more in:
Cluster Analysis Using Rough Clustering and k-Means Clustering
4. A model, proposed by Pawlak to capture imprecision in data through boundary approach.
Learn more in: A Comprehensive Review of Nature-Inspired Algorithms for Feature Selection
5. An approximation of a vague concept, through the use of two sets – the lower and upper
approximations. Learn more in: Fuzzy-Rough Data Mining
6. It is another efficient model to capture impreciseness proposed by Z.Pawlak in 1982, which
follows the idea of G.Frege by defining the boundary region to capture uncertainty. It
approximates every set by a pair of crisp sets, called the lower and upper approximations. Learn
more in: On Theory of Multisets and Applications
7. A rough set is a formal approximation of a crisp set in terms of a pair of sets that give the
lower and upper approximation of the original set Learn more in: Rough Set-Based Neuro-Fuzzy
System
8. An uncertainty based model introduced by Z.Pawlak in the year 1982 which captures
uncertainty through boundary region concept, the model introduced by G. Frege, the father of
modern logic. Learn more in: Rough Set Based Green Cloud Computing in Emerging Markets
9. The concept of rough, or approximation, sets was introduced by Pawlak, and is based on the
single assumption that information is associated with every object in an information system.
This information is expressed through attributes that describe the objects, and objects that
cannot be distinguished on the basis of a selected attribute are referred to as indiscernible. A
rough set is defined by two sets, the lower approximation and the upper approximation. Learn
more in: Cluster Analysis Using Rough Clustering and K-Means Clustering
10. Set of elements which lie between lower and upper approximations of a crisp set according
to rough set theory by Pawlak. Learn more in: Cancer Biomarker Assessment Using Evolutionary
Rough Multi-Objective Optimization Algorithm

Page no: 1 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

11. A model, proposed by Pawlak to capture imprecision in data through boundary approach.
Learn more in: Swarm Intelligence in Solving Bio-Inspired Computing Problems: Reviews,
Perspectives, and Challenges
12. Rough set theory was initiated by Pawlak (1982). Let U be a universe and R be an
equivalence relation over U. This equivalence relation decomposes U into disjoint equivalence
classes. We denote the equivalence class of an element x with respect to R by [ x ] R , which is
defined as [ x ] R = { y | yRx }. Then for any , we associate two crisp sets and called the lower
and upper approximations of X with respect to R respectively and are defined as, = { x ? U: X}
and = { x ? U: [ x ] R n X }. Learn more in: Soft Sets and Its Applications
13. An approximation of a set in terms of a pair of sets called the lower approximation and the
upper approximation. The set difference between the lower and the upper approximations
gives the boundary region which characterizes the uncertainty. Learn more in: Rough
Approximations on Hesitant Fuzzy Sets
14. It is an imprecise model introduced by Z. Pawlak in 1982, where sets are approximated by
two crisp sets with respect to equivalence relations. A set is rough with respect to an
equivalence relation or not depending upon whether the lower and upper approximations are
not equal or otherwise. Several extensions of this basic model also exist. Learn more in:
Application of Rough Set Based Models in Medical Diagnosis
15. This is another model of uncertainty which was introduced by Pawlak in 1982 and it follows
the concept of Frege on the boundary region model of uncertainty. Here, a set is approximated
by a pair of crisp sets called the lower and upper approximation of the set. Learn more in:
Application of Uncertainty Models in Bioinformatics

Set approximation
Let U be a finite and non-empty set called the universe, and let E U ×U denote an equivalence
relation on U. The pair apr = (U,E) is called an approximation space. The equivalence relation E
partitions the set U into disjoint subsets. This partition of the universe is called the quotient set
induced by E and is denoted by U/E. The equivalence relation is the available information or
knowledge about the objects under consideration. It represents a very special type of similarity
between elements of the universe. If two elements x, y in U belong to the same equivalence
class, we say that x and y are indistinguishable, i.e., they are similar. Each equivalence class may
be viewed as a granule consisting of indistinguishable elements. It is also referred to as an
equivalence granule. The granulation structure induced by an equivalence relation is a partition
of the universe.
An arbitrary set X U may not necessarily be a union of some equivalence classes. This implies
that one may not be able to describe X precisely using the equivalence classes of E. In this case,
one may characterize X by a pair of lower and upper approximations:
apr(X) = [{[x]E | x 2 U, [x]E X},
apr(X) = [{[x]E | x 2 U, [x]E \ X 6= ;},
where [x]E = {y | y 2 U, xEy},
is the equivalence class containing x. Both lower and upper approximations are unions of some
equivalence classes. More precisely, the lower approximation apr(X) is the union of those
equivalence granules which are subsets of X. The upper approximation apr(X) is the union of
those equivalence granules which have a non-empty intersection with X.

Page no: 2 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

In addition to the equivalence class oriented definition, i.e., granule oriented definition, we can
have an element oriented definition:
apr(X) = {x | x 2 U, [x]E X},
apr(X) = {x | x 2 U, [x]E \ X 6= ;}.
An element x 2 U belongs to the lower approximation of X if all its equivalent elements belong
to X. It belongs to the upper approximation of X if at least one of its equivalent elements
belongs to X. This interpretation of approximation operators is related to interpretation of the
necessity and possibility operators in modal logic.
Lower and upper approximations are dual to each other in the sense:
(Ia) apr(X) = (apr(Xc))c,
(Ib) apr(X) = (apr(Xc))c,
he e X = U − X is the o ple ent of X. The set X lies within its lower and upper
approximations:
(II) apr(X) X apr(X).
Intuitively, lower approximation may be understood as the pessimistic view and the upper
approximation the optimistic view in approximating a set. One can verify the following
properties:
(IIIa) apr(X \ Y ) = apr(X) \ apr(Y ),
(IIIb) apr(X [ Y ) = apr(X) [ apr(Y ).
The lower approximation of the intersection of a finite number of sets can be obtained from
their lower approximations. The similar observation is true for upper approximation of the
union of a finite number of sets. However, we only have:
(IVa) apr(X [ Y ) apr(X) [ apr(Y ),
(IVb) apr(X \ Y ) apr(X) \ apr(Y ).
One cannot obtain the lower approximation of the union of some sets from their lower
approximations, nor obtain the upper approximation of the intersection of some sets from their
upper approximations.
The accuracy of rough set approximation may be viewed as an inverse of MZ metric when
applied to lower and upper approximations. In other words, the distance between the lower
and upper approximations determines the accuracy of the rough set approximations.

Rough membership
Rough sets can be also defined, as a generalization, by employing a rough membership function
instead of objective approximation. The rough membership function expresses a conditional
probability that x belongs to X given R. This can be interpreted as a degree that X in terms of
information about x expressed by R.
Rough membership primarily differs from the fuzzy membership in that the membership of
union and intersection of sets cannot, in general, be computed from their constituent
membership as is the case of fuzzy sets. In this, rough membership is a generalization of fuzzy
membership. Furthermore, the rough membership function is grounded more in probability
than the conventionally held concepts of the fuzzy membership function. Rough sets can be
also defined, as a generalization, by employing a rough membership function instead of
objective approximation. The rough membership function expresses a conditional probability

Page no: 3 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

that X given R. This can be interpreted as a degree that x belongs to X in terms of information
about x expressed by R.
Rough membership primarily differs from the fuzzy membership in that the membership of
union and intersection of sets cannot, in general, be computed from their constituent
membership as is the case of fuzzy sets. In this, rough membership is a generalization of fuzzy
membership. Furthermore, the rough membership function is grounded more in probability
than the conventionally held concepts of the fuzzy membership function.

Attributes
Rough set theory is useful for rule induction from incomplete data sets. Using this approach we
can distinguish between three types of missing attribute values: lost values (the values that
were recorded but currently are unavailable), attribute-concept values (these missing attribute
values may be replaced by any attribute value limited to the same concept), and "do not care"
conditions (the original values were irrelevant). A concept (class) is a set of all objects classified
(or diagnosed) the same way.
Two special data sets with missing attribute values were extensively studied: in the first case, all
missing attribute values were lost, in the second case, all missing attribute values were "do not
care" conditions.
In attribute-concept values interpretation of a missing attribute value, the missing attribute
value may be replaced by any value of the attribute domain restricted to the concept to which
the object with a missing attributes value. For example, if for a patient the value of an attribute
Temperature is missing, this patient is sick with flu, and all remaining patients sick with flu have
values high or very-high for Temperature when using the interpretation of the missing attribute
value as the attribute-concept value, we will replace the missing attribute value with high and
very-high. Additionally, the characteristic relation enables to process data sets with all three
kind of missing attribute values at the same time: lost, "do not care" conditions, and attribute-
concept values.

Optimization
Since the development of rough sets, extensions and generalizations have continued to evolve.
Initial developments focused on the relationship - both similarities and difference - with fuzzy
sets. While some literature contends these concepts are different, other literature considers
that rough sets are a generalization of fuzzy sets - as represented through either fuzzy rough
sets or rough fuzzy sets. Pawlak (1995) considered that fuzzy and rough sets should be treated
as being complementary to each other, addressing different aspects of uncertainty and
vagueness.
Three notable optimization techniques of are:
1. Dominance-based rough set approach (DRSA) is an extension of rough set theory for multi
criteria decision analysis (MCDA). The main change in this extension of classical rough sets is
the substitution of the indiscernibility relation by a dominance relation, which permits the
formalism to deal with inconsistencies typical in consideration of criteria and preference-
ordered decision classes.
2. Decision-theoretic rough sets (DTRS) are a probabilistic extension of rough set theory
introduced by Yao, Wong, and Lingras (1990). It utilizes a Bayesian decision procedure for

Page no: 4 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

minimum risk decision making. Elements are included into the lower and upper approximations
ased o hethe thei o ditio al p o a ilit is a o e th esholds α a d β. These uppe a d
lower thresholds determine region inclusion for elements. This model is unique and powerful
since the thresholds themselves are calculated from a set of six loss functions representing
classification risks.
3. Game-theoretic rough sets (GTRS) are a game theory-based extension of rough set that was
introduced by Herbert and Yao (2011). It utilizes a game-theoretic environment to optimize
certain criteria of rough sets based classification or decision making in order to obtain effective
region sizes.

SVM – Introduction
Support Vector Machines are based on the concept of decision planes that define decision
boundaries. A decision plane is one that separates between a set of objects having different
class memberships. A schematic example is shown in the illustration below. In this example, the
objects belong either to class GREEN or RED. The separating line defines a boundary on the
right side of which all objects are GREEN and to the left of which all objects are RED. Any new
object (white circle) falling to the right is labeled, i.e., classified, as GREEN (or classified as RED
should it fall to the left of the separating line).

Fig 1.6 SVM plane 1

The above is a classic example of a linear classifier, i.e., a classifier that separates a set of
objects into their respective groups (GREEN and RED in this case) with a line. Most classification
tasks, however, are not that simple, and often more complex structures are needed in order to
make an optimal separation, i.e., correctly classify new objects (test cases) on the basis of the
examples that are available (train cases). This situation is depicted in the illustration below.
Compared to the previous schematic, it is clear that a full separation of the GREEN and RED
objects would require a curve (which is more complex than a line). Classification tasks based on
drawing separating lines to distinguish between objects of different class memberships are
known as hyperplane classifiers. Support Vector Machines are particularly suited to handle such
tasks.

Fig 1.7 SVM plane 2

Page no: 5 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

The illustration below shows the basic idea behind Support Vector Machines. Here we see the
original objects (left side of the schematic) mapped, i.e., rearranged, using a set of
mathematical functions, known as kernels. The process of rearranging the objects is known as
mapping (transformation). Note that in this new setting, the mapped objects (right side of the
schematic) is linearly separable and, thus, instead of constructing the complex curve (left
schematic), all we have to do is to find an optimal line that can separate the GREEN and the RED
objects.

Fig 1.8 SVM plane 3

Obtaining the optimal hyper plane


We first compute the distance ∥p∥ between a point A and a hyperplane. We then compute the
margin which was equal to 2∥p∥.
However, even if it did quite a good job at separating the data it was not the optimal
hyperplane.

Fig 1.9 The margin we calculated in Part 2 is shown as M1

As we saw in Part 1, the optimal hyperplane is the one which maximizes the margin of the
training data.

Page no: 6 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

In Figure 1.9, we can see that the margin M1, delimited by the two blue lines, is not the biggest
margin separating perfectly the data. The biggest margin is the margin M2 shown in Figure 1.10
below.

Fig 1.10 The optimal hyperplane is slightly on the left of the one we used in Part 2.

You can also see the optimal hyperplane on Figure 1.10. It is slightly on the left of our initial
hyperplane. How did I find it? I simply traced a line crossing M2 in its middle.
If I have a hyperplane I can compute its margin with respect to some data point. If I have a
margin delimited by two hyperplanes, I can find a third hyperplane passing right in the middle
of the margin.

Linear and nonlinear SVM classifiers


A method is linear (very basically) if your classification threshold is linear (a line, a plane or a
hyperplane). Otherwise, it is non linear. Obviously, linear methods involve only linear
combinations of data, leading to easier implementations, etc.
In principle, both ANN and SVM are non linear because they use, in general, non linear
functions of the data (the activation function in ANN or the kernel in SVM are typically non
linear functions). However, in both cases you could use linear functions in the problem are
linearly separable.
In most useful cases, a non linear technique is required, but a linear one is desired... You might
use a suboptimal classifier (linear) if the error might be assumed in opposition to the
complexity of a non linear implementation.
We use Linear and non-Linear classifier under following conditions:
1. If accuracy is more important to you than the training time then use Non-linear else use
Linear classifier. This is because linear classifier uses linear kernels and is faster than non-linear
kernels used in the non-linear classifier.

Page no: 7 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

2. Linear classifier (SVM) is used when number of features are very high, e.g., document
classification. This is because Linear SVM gives almost similar accuracy as non linear SVM but
Linear SVM is very fast in such cases.
3. Use non-linear classifier when data is not linearly separable. Under such conditions, linear
classifiers give very poor results (accuracy) and non-linear gives better results. This is because
non-linear Kernels map (transform) the input data (Input Space) to higher dimensional space
(called Feature Space) where a linear hyperplane can be easily found.

Introduction to Swarm Intelligence


Swarm Intelligence is a new subset of Artificial Intelligence (AI) designed to manage a group of
connected machines. We are now entering the age of the Intelligent machines, also called the
Internet of Things (IoT), where more and devices are being connected every day. Swarm
intelligence is quickly emerging as a way this connectivity can be harnessed and put to good
use.
Swarm intelligence (as the name suggests) comes from mimicking nature. Swarms of social
insects, such as ants and bees, operate using a collective intelligence that is greater than any
individual member of the swarm. Swarms are therefore highly effective problem-solving groups
that can easily deal with the loss of individual members while still completing the task at hand --
a capability that is very desirable for a huge number of applications. Today this concept is being
applied in concert with machine learning and distributed computing systems. The result is a
group of connected machines that can communicate, coordinate, learn and adapt to reach a
specific goal. Check out the video below and its subsequent follow up videos to see how swarm
intelligence is applied to a group of drones.

Swarm Intelligence Techniques: Ant Colony Optimization


Ant colony optimization (ACO), introduced by Dorigo in his doctoral dissertation, is a class of
optimization algorithms modeled on the actions of an ant colony. ACO is a probabilistic
technique useful in problems that deal with finding better paths through graphs. Artificial
'ants'—simulation agents—locate optimal solutions by moving through a parameter space
representing all possible solutions. Natural ants lay down pheromones directing each other to
resources while exploring their environment. The simulated 'ants' similarly record their
positions and the quality of their solutions, so that in later simulation iterations more ants
locate for better solutions.
The use of swarm intelligence in telecommunication networks has also been researched, in the
form of ant-based routing. This was pioneered separately by Dorigo et al. and Hewlett Packard
in the mid-1990s, with a number of variations since. Basically, this uses a probabilistic routing
table rewarding/reinforcing the route successfully traversed by each "ant" (a small control
packet) which flood the network. Reinforcement of the route in the forwards, reverse direction
and both simultaneously has been researched: backwards reinforcement requires a symmetric
network and couples the two directions together; forwards reinforcement rewards a route
before the outcome is known (but then one would pay for the cinema before one knows how
good the film is). As the system behaves stochastically and is therefore lacking repeatability,
there are large hurdles to commercial deployment.

Page no: 8 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

Mobile media and new technologies have the potential to change the threshold for collective
action due to swarm intelligence.
The location of transmission infrastructure for wireless communication networks is an
important engineering problem involving competing objectives. A minimal selection of
locations (or sites) is required subject to providing adequate area coverage for users. A very
different-ant inspired swarm intelligence algorithm, stochastic diffusion search (SDS), has been
successfully used to provide a general model for this problem, related to circle packing and set
covering. It has been shown that the SDS can be applied to identify suitable solutions even for
large problem instances.

Particle Swarm Optimization


Particle swarm optimization (PSO) is a global optimization algorithm for dealing with problems
in which a best solution can be represented as a point or surface in an n-dimensional space.
Hypotheses are plotted in this space and seeded with an initial velocity, as well as a
communication channel between the particles. Particles then move through the solution space,
and are evaluated according to some fitness criterion after each time step. Over time, particles
are accelerated towards those particles within their communication grouping which have better
fitness values. The main advantage of such an approach over other global minimization
strategies such as simulated annealing is that the large numbers of members that make up the
particle swarm make the technique impressively resilient to the problem of local minima.
Nanoparticles are bioengineered particles that can be injected into the body and operate as a
system to do things drug treatments cannot. The primary problem with all of our current cancer
treatments is most procedures target healthy cells in addition to tumors, causing a whole host
of side effects. Nanoparticles by comparison, are custom designed to accumulate ONLY in
tumors, while avoiding healthy tissue.
Nanoparticles can be designed to move, sense, and interact with their environment, just like
robots. In medicine, we call this embodied intelligence. The challenge thus far has been
figuring out how to properly "program" this embodied intelligence to ensure it produces the
desired outcome.
Swarms are very effective when a group of individual elements (nanoparticles in this case)
begin reacting as a group to local information. Swarm intelligence is emerging as the key to
which will unlock the true potential of these tiny helpers. Researchers are now reaching out to
the gaming community in an effort to crowd source the proper programming for swarm of
nanoparticles.

Bee Colony Optimization


The Artificial Bee Colony (ABC) algorithm is a swarm based meta-heuristic algorithm that was
introduced by Karaboga in 2005 (Karaboga, 2005) for optimizing numerical problems. It was
inspired by the intelligent foraging behavior of honey bees. The algorithm is specifically based
on the model proposed by Tereshko and Loengarov (2005) for the foraging behavior of honey
bee colonies. The model consists of three essential components: employed and unemployed
foraging bees, and food sources. The first two components, employed and unemployed
foraging bees, search for rich food sources, which is the third component, close to their hive.
The model also defines two leading modes of behavior which are necessary for self-organizing

Page no: 9 Follow us on facebook to get real-time updates from RGPV


Downloaded from be.rgpvnotes.in

and collective intelligence: recruitment of foragers to rich food sources resulting in positive
feedback and abandonment of poor sources by foragers causing negative feedback.
In ABC, a colony of artificial forager bees (agents) search for rich artificial food sources (good
solutions for a given problem). To apply ABC, the considered optimization problem is first
converted to the problem of finding the best parameter vector which minimizes an objective
function. Then, the artificial bees randomly discover a population of initial solution vectors and
then iteratively improve them by employing the strategies: moving towards better solutions by
means of a neighbor search mechanism while abandoning poor solutions.
The general scheme of the ABC algorithm is as follows:

First Initialization Phase


REPEAT
1. Employed Bees Phase
2. Onlooker Bees Phase
3. Scout Bees Phase
4. Memorize the best solution achieved so far
UNTIL (Cycle=Maximum Cycle Number or a Maximum CPU time)

Page no: 10 Follow us on facebook to get real-time updates from RGPV

You might also like