Soft Computing Note
Soft Computing Note
Soft Computing Note
Yes
No
Crisp set
Fuzzy-No definite yes or No
Like how is the Weather today
noimal .
hot .
vey hot .
extimely hot .8
_ Fuzzy set
Fuzzy set theory from the basis of fuzzy logic.
Soft Computing
Crisp set
Universe of discourse: - Universal set is the set which, with reference to a
particular content, continuous all possible elements having the same characteristics
and from which set can be formed. Universal set is usually denoted by E.
E.g.: The universal set of all students in a university.
Set: - A set is a well defined collection of objects. Is a object either belongs to or
does not belong to the set. A set in certain contexts may be associated with its
universal set from which it is derived.
Given A= { a
1
a
2
a
3 .. .. .. ..
a
n
}
a
1
a
2
a
3
are the member of the set. This representation is known as list
form.
E.g.: A= {Gandhi, Bose, Nehru}
A set may also be defined based on the properties the members have to satisfy.
E.g.: A={x | P(x)}
P is the property to be satisfied by x; P(x) also known as characteristic
function.
Pictorially set can be represented by Venn diagrams:
Membership: - An element x is said to be a member of set A if x belongs to the
set A.
Cardinality: - The number of elements in the set is called its cardinality.
Cardinality is denoted by n(A) or |A| or A
E.g.: A= {1, 2, 3, 4, 5} |A|=5
Soft Computing
Family of set:-A set whose members are set in them is referred as a family of
sets.
E.g.: A= {{1, 2, 3}, {4, 5, 6}}
Null set:-Empty set denoted by or { }
Singleton set:-A set having exactly one element a. A singleton set is denoted by
{a} and is the simplest example of nonempty set.
Sub set: - A subset is a portion of a set. B is a subset of A (written BA) if every
member of B is a member of A.
Fuzzy Set:-
Suppose a flexible sense of membership of elements do a set. In set theory an
element either belongs to or does not belong to a set. In fuzzy theory many degree
of membership (between 0 and 1) are allowed.
Membership function A
(x)
Since many degrees of membership is allowed is allowed in membership function
is associated with a fizzy set A
(x)
:X [0, 1]
Definition of Fuzzy set:-
If X is a universe of discourse and x is a particular element of X, then a fuzzy set A
determined on X may be written as a collection of ordered pairs.
A= {(x, A
(x)
), xX}
Where each pair (x, A
(x)
) is called a singleton.
An alternative definition which includes fuzzy set as union of all A
(x)
/x
.
Singleton is given by: A=_ A
(i)x
1
in discrete case.
And A=] A
()x
X
in continuous case.
Soft Computing
Membership Function:-
May not be always discrete values, may be continuous function, may be
mathematical A
(x)
=
(+x)
E.g.:-Consider a set of population, the following age group:
0-10 40-50
10-20 50-60
20-30 60-70
30-40 70 and above
Membership function representing fuzzy set Young, Middle aged and old.
Soft Computing
Lecture-2
Basic Fuzzy set Theory:-
Given X to be the universe of discourse and A
and B
and B
also on
X.with membership defined is.
A
B
(x) = max (
A
(x),
B
(x))
Let A
={(x
1
,0.5),(x
2
,0.7),(x
3
,0)} B
=={(x
1
,0.8),(x
2
,0.2),(x
3
,1)}
A
={(x
1
,0.8),(x
2
,0.7),(x
3
,1)}
2. Intersection: The intersection of fuzzy set A
and B
(x) = min (
A
(x),
B
(x))
Soft Computing
3. Complement: Complement of fuzzy set A
is a new set A
with a membership
function.
A
c
A
c
(x)=- A
(x)
A
Set of young.
A
c
Set of not young.
A = {(x
1
, 0.3), (x
2
, 0.7), (x
3
, 0.8)}
A
c
= {(x
1
, 0.7), (x
2
, 0.3), (x
3
, 0.2)}
4. Product of two Fuzzy set: Product of two fuzzy set A
and B
is a new fuzzy
set A
. B
. B
= A
(X).B
(X)
A
= {(x
1
, 0.2), (x
2
, 0.8)}
B
= {(x
1
, 0.1), (x
2
, 0)}
A
. B
= {(x
1
, 0.2), (x
2
, 0)}
5. Equality: Two fuzzy set A
and B
= B
) if A
(X)
=B
(X)
E.g.: A
= {(x
1
, 0.2), (x
2
, 0.8)}
B
= {(x
1
, 0.6), (x
2
, 0.8)}
C
= {(x
1
, 0.2), (x
2
, 0.8)}
Soft Computing
A
= C
6. Product of fuzzy set with a crisp number: Nultiplying a fuzzy set A
by a ciisp numbei iesults in a new fuzzy set piouuct a. A
with membeiship
function.
a.A
(x)=a. A
(X)
E.g.: A
= {(x
1
, 0.4), (x
2
, 0.6), (x
3
, 0.8)}
a . 0.3
a . A
= {(x
1
, 0.12), (x
2
, 0.18), (x
3
, 0.24)}
7. Power of a Fuzzy set: The u power of a fuzzy set A
whose membership function is given by.
A
(x) = (A
(x))
Raising a Fuzzy set to its 2
nd
power is called Concenti Concenti Concenti Concentiation ation ation ation ( (( (C0N) C0N) C0N) C0N). .. .
Taking the square root is called Bilation (BIL). Bilation (BIL). Bilation (BIL). Bilation (BIL).
E.g.: A
= {(x
1
, 0.4), (x
2
, 0.2)}
=2
(A)
= {(x
1
, 0.16), (x
2
, 0.04)}
8. Difference: The difference of two fuzzy set A
and B
-B
defined as:
A
-B
= ( A
c
)
E.g.: A
= {(x
1
, 0.2), (x
2
, 0.5), (x
3
, 0.6)}
B
= {(x
1
, 0.1), (x
2
, 0.4), (x
3
, 0.5)}
B
c
= {(x
1
, 0.9), (x
2
, 0.6), (x
2
, 0.5)}
A
-B
= {(x
1
, 0.2), (x
2
, 0.5), (x
2
, 0.5)}
Soft Computing
9. Disjunctive sum: The disjunctive sum of two fuzzy sets A
and B
is a new
fuzzy set A
defined as:
A
=(A
c
B
) (A
c
)
Lecture-3
Properties of Fuzzy set:-
Any fuzzy set A
=
B
=
B
(2)Associativity: A
( B
) = (A
) C
( B
) = (A
) C
(3)Distributivity: A
( B
) = (A
) (A
)
A
( B
) = (A
) (A
)
(4)Idempotance: A
=
A
= A
(5)Identity: A
= = = = A
X = X X = X X = X X = X
A
= = = =
A
X = X X = X X = X X = X
(6)Transitivity: A
then A
(7)Issolation: (A
c
)
c
= A
(8)De Morgans Law: (A
)
c
= (A
c
B
c
)
(A
)
c
= (A
c
B
c
)
Soft Computing
Since fuzzy sets can overlap, the law of excluded middle do not hold good. This is
defined below:
A
C
X
A
C
it is called law of Contiauiction. Contiauiction. Contiauiction. Contiauiction.
Fuzzy relations: Is a fuzzy set defined on the Cartesian product crisp set
X
1,
X
2,.
X
n,
where n tuples (X
1,
X
2,.
X
n)
may have varying degree of
membership value indicate the strength of the relation.
E.g.: X
1 =
{ , , Cold }
X
2
= { , High temp, Shining}
Fuzzy relation may be defined as:
High temp, shining
Typ 0.1 0.4 0.8
Sura 0.2 0.9 0.7
Cold 0.9 0.4 0.6
Fuzzy Cartesian product:
Let A
and B
indicated as A
and resuting in a fuzzy relation R
is given by:
R
= A
X Y
Wheie R
(x,y) = A
(x,y)
= min (A
(x), B
(y))
Soft Computing
Lecture-4
Operations on fuzzy Relations:
Let R
and S
be fuzzy relation on X Y.
Union: R
(x, y)), (S
(x, y)))
Intersection: R
(x, y)), (S
(x, y)))
Complement: R
C
(x, y) = -R
(x, y)
Comparisons of relations:
Suppose R
is a fuzzy ielation
uefineu on Y Z, then R
(x, y) = max(min (R
(x, y), S
(x, y)))
y Y
Cartesian product and Co-Product:
Let A anu B be fuzzy set in X anu Y iespectively, The Caitesian piouuct of A
anu B is a fuzzy set an the piouuct space X Y with the membeiship function.
AB(x,y) = min (A (x) , B (y))
Similarly, Cartesian Co Product. A+ B is a fuzzy set with the membership
function.
A+B(x,y) = max (A (x) , B (y))
Soft Computing
Fuzzy set:
It is a set without a crisp boundary. The transition from belong to a set to not to
belong to a set is gradual and this smooth transition is characterized by
membership functions. Membership function gives fuzzy sets flexibility in
modeling commonly used linguistic expressions,the water is hot the
temperature is high. Such imprecision plays an important role in human thinking,
particularly in the domain of pattern recognition, communication of information
and abstraction.
Fuzziness is due to uncertain and imprecise nature of abstract thoughts and
concept.
It is not because of randomness of the constituent member of the set.
Fuzzy set expresses the degree to which an element belongs to a set.
Constructing of fuzzy set depends on
(i) identification of suitable universe of discourse
(ii) the specification of an appropriate membership function
Specification of membership function is subjective i.e. membership function
defined by different persons may vary considerably for the same concept
In practice, when the universe of discourse X is a continuous space, we usually
partition X into several fuzzy set whose MFs cover X in more or less uniform
manner.
The fuzzy sets usually carry that conform to adjectives appearing in our daily
linguistic usage such as large, medium, small called linguistic values or
linguistic labels. Thus universe of discourse is often called the linguistic
variable.
Common nomenclature:-
1. Support: The support of a fuzzy set A is the set of all points x in X such
that
A(x) > Support (A) ={x|0
A
(x) > ]
2. Core: The core of fuzzy set A is the set of all points x in X such that
0
A
(x) =
Soft Computing
Core (A) = {x| A(x) =
3. Normality: A fuzzy set A is normal if its core is non empty i.e. there is
always a point xX such that A(x) =
4. Crossover points: a crossover point of a fuzzy set A is a point xX at
which
A(x) = . Crossover (A) = {x|0
A
(x) = .}
5. Fuzzy singleton: a fuzzy set whose support is a single point in x with
0
A
(x) = is called a fuzzy singleton
6. cut: the cut of fuzzy set A is a crisp set defined by
A = {x|A(x) ]
Strong cut:
A' = {x|A(x)> o}
Hence support and core fuzzy set A can be expressed as
Support (A) = A'
0
And
Core (A) = A
1
respectively
7. Convexity: a fuzzy set A is convex if and only if for any x
1
,x
2
X and
any [0,1],
Equation (1): A(x)
(x
1
+ (1-) x
2
) min {A(x); A(x)]
Alternatively A is convex if all its level sets are convex.
In crisp set C in R
n
is convex if and only if for any two points x
1
C and x
2
C,
their combination x
1
+ (1-) x
2
is still in C, where z . Hence the
convexity of a crisp level set A is composed of a single line segment only
Convexity definition of fuzzy set is not as strict as the common definition of
convexity of a function f(x) is
(z
1
+ ( -z)
2
) z(
1
) +( -z)(
2
)
This is a more stringent condition than equation- (1)
Soft Computing
8. Fuzzy numbers: A fuzzy number A is a fuzzy set in the real line(R) that
satisfies the condition for normality and convexity.
Most non convexity fuzzy set used in the literature satisfy the
conditions for normality.
9. Bandwidths of normal and convex fuzzy set: for a normal and
convex fuzzy set, the bandwidth or width is defined as the distance between
the two unique crossover points:
() = |
2
-
1
|,
Where A (x)= A(x)=0.
10. Symmetry: A fuzzy set A is symmetric if its MF is symmetric around
a certain point x=C namely A
(C + x) = A (c-x) foi all x X
11. .Open left, open right, closed: A fuzzy set A is open left if
lim
x-
A (x)= and lim
x+
A (x)= ;
open right if lim
x-
A' (x)= and lim
x+
A (x)= ;
and closed if lim
x-
A (x)= lim
x+
A (x) =
Soft Computing
Lecture-5
MF Formulation and parameterization:-
A fuzzy set is completely characterized by its MF.
There are various ways of defining Membership function
Discrrestized membership function of every pair is explicitly stated.
A more and convenient and concise way is to represent membership
function by some mathematical formula for e.g. (x) =
1
1+x
2
We can also define membership function through classes of
parameterized functions.
MFs of one dimension has one input.
Fuzzy complement:
A fuzzy complement operator is a continuous function N:[0,1] [0, 1] which
meets the following axiomatic requirements:
N (0) =1 and N (1) =0 (boundary)
N (a) N (b) if ab (monotonocity)
Soft Computing
Any function satisfies this requirement form the general class of fuzzy
complements. A notification of boundary conditions would include functions that
do not conform to the ordinary complement for crisp sets. The monotonic
decreasing requirement is essential since we intuitively expect that an increase in
membership grade of a fuzzy set must result in a decrease in the membership grade
of its complement.
Another optional requirement imposes involution on a fuzzy complement:
N (N (a)) = a (Involution)
Sugenos Complement:
One class of fuzzy complement is Sugenos complement, defined by:
N
S
(a) =
1-a
1+Sa
When S is a parameter greater then -1.
Soft Computing
Yagers Complement:
Another class of fuzzy complement is Yagers complement, defined by:
N
w
(a) = (1- a
w
)
1/w
Where w is a (+)ve parameter.
Both Sugenos and yagers complement are symmetric about 45 about line
connecting (0,0) , (1,1).
Soft Computing
Lecture-6
Fuzzy intersection and Union:
The intersection of two fuzzy set A and B is specified in general by a function:
T:[0,1][0,1][0,1] which aggregates two membership grades as
follows:
A
B
(x) = T (
A
(x),
B
(x))
=
A
(x) -
B
(x) where - is a Binary operator for T.
This class of fuzzy intersection operator is usually referred as T-norm operator
which should meets the following requirements.
T-norm:
A T-norm operator is a two-place function satisfying
Impose correct generalized to crisp set _
T ( , ) = ;
T ( , a) = t(a, ) = a
_(Boundary
condition)
A decrease in member value in A or B can not produce an use in membership in
value in A B
T (a,b) T(c,a) if if a c and bd (monotonicity)
Allows to take intersection of any number of set in any order of pair wise
grouping
T (a,T(b,c)=T(T(a,b),c) (Associativity)
Indicates that operator is indifferent of the order
T (a,b) =T(b,a) (Commutativity)
Four T-norm Operators:
Minimum: T
min
(a,b) = min(a,b) = a b
Soft Computing
Algebraic Product: T
ap
(a,b) =ab
Bounded Product: T
bp
(a,b) =0 (a +b -1)
Drastic Product: T
dp
(a,b) =_
a if b =
b if a =
if a, b <
E.g. With an understanding that a and b are between 0 and 1, we can draw plots.
Let, a= A(x) =trapezoid(x; 3, 8, 12, 17)
b= B(x) = trapezoid(y; 3, 8, 12, 17)
Fuzzy Union:
The union operator is specified in general by a function S: [0 1] [0, 1] [0, 1]
In symbol: AB(x) =S (A(x), B(x)) = A(x) +
B(x)
Where +
S(,) = ,
S (, a) = S(a, b) = a
_ Boundary
S (a,b)S(c,a) if a c and bd (Monotonicity)
S (a,b)= S(b,a) (Commutativity)
S (a, S (b,c))= S(S(a,b)),c) (associativity)
Soft Computing
Four T-conorm operators:-
Maximum: S (a,b)=max S(a,b)=ab
Algebraic sum: S (a,b)=a+b-ab
Bounded sum: S (a,b)=1 (a +b)
Drastic sum: S (a,b)=_
a if b = a
b if a =
if a, b >
S
max
(a,b) S
ap
(a,b) S
bp
(a,b) S
dp
(a,b)
Generalized De Morgans Law:
T-norms T (.,.) and T-conorms S(.,.) are duals which supports the generalization of
De Morgans law:
T (a,b)=N(S(N(a),N(b))),
S (a,b)
=
N(T(N(a),N(b)))
Also can be written as:
a - b = N(N(a) +
N(b)),
a +
b = N(N(a) -N(b))
Soft Computing
Lecture-7
Triangular MFs:
Is specified by their parameters {a, b, c}. a < b < c
Triangle (X; a,b,c)=
`
1
1
1
1
, x a
x-a
b-a
, a x b
c-x
c-b
, b x c
, c x
Alternatively;
Triangle (X: a, b, c) =max (min (
x-a
x-b
,
c-x
c-b
), 0)
The Parameter {a,b,c} (with a< b< c) determine the x coordinates of the three
corners of the underlying triangular MF.
Trapezoidal MFs:
Is specified by four parameters {a, b, c, d} as follows.
Trapezoid (X; a, b, c, d) a <b <c < d
=
`
1
1
1
1
1
1
, x a
x-a
b-a
, a x b
, b x c
d-x
d-c
, c x u
, u x
Soft Computing
Alternatively;
Trapezoid (X; a, b, c, d) = max (min (
x-a
x-b
,,
c-x
c-b
), 0)
The parameter (a, b, c, d) determine the X coordinates of the four corners of the
underlying Trapezoidal MF.
When (b=c) a trapezoidal MF is equivalent to triangular MF.
Gaussian MFs:
A Gaussian MF is specified by two parameters {c, }
2
Gaussian (x; c, ) = e
-
(
x-c
o
)
A uaussian NF is ueteimineu completely by:
. C which iepiesents the centei of NF.
. which ueteimines the NFs wiuth.
Gaussian (x; 50, 20)
Generalized bell MFs:
A generalized bell MF (or bell MF) is specified by three parameters {a,b,c}
bell (x;a,b,c)=1/1+|
x-c
a
|)
2b
Where the parameters, b is usually (+) ve . If b is negative (-) ve the MF becomes
up-side down bell. Adjusting c, a will vary the centre and width of the MF, and
then use b to control slop at crossover point, since bell function has one extra
parameter than Gaussian it has on more degree of freedom to adjust the steepness
at the crossover points.
Soft Computing
Lecture-8
Sigmoidal MFs: (used to specify assemytrical membership function)
A Sigmoidal MF is defined by: sig(x; a, c) =
1
1+cxp |-a(x-c)]
Where a controls the slope at crossover point x =c.
Depending on the sign of the parameter a, the Sigmoidal is open right or open left.
Used widely to activation function of artificial neural networks.
a +ve open right a ve open left
Closed and asymmetric MFs based on sigmoidal function:
Obtained by taking the difference of two sigmoidal functions. (Also by taking
product of two sigmoidal function)
|y
1
y
2
|
y
1
=sig(x; 1,-5) and y
2
=sig(x; 2,5)
Soft Computing
Left Right MF:
A left right MF or L-R MF is specified by the parameter {a, b, c};
L R(x; c, , ) =
FL
c-x
a
; x c
FR
x-c
; x c
When F
L
(x) and F
R
(x) are monotonically decreasing function defined on [0, 0]
with F
L
(0) = F
R
(0) =1 and lim
x
FL(x) =lim
x
FR(x) = 0
F
L
(x) max (0,V - x
2
)
F
R
(x)e
-|x|
Any type of continuous probability distribution function can be used as an MF
provided that a set of parameter is given to specify the appropriate meaning of MF.
Soft Computing
MFs of Two Dimension:
MFs with two inputs, each in a different universe of discourse.
Cylindrical extension of one dimensional fuzzy set:
If A is a fuzzy set in X, then its cylindrical extension in X Y is a fuzzy set c(A)
defined by:
c(A)=]
A
X Y
(x)/(x,y)
(1.) A={1,3,5} B={1,3,5}
AB={(1,1), (1,3) , (1,5) , (3,1) , (3,3) , (3,5) , (5,1) , (5,3) , (5,5)}
R :{(x,y)|y=x+2},
S :{(x,y)|x<y}
R :{( 1, 3), (3, 5)}
S :{ (1, 3), (1, 5), (3, 5)}
The relation matrix:
R S=max (R(x,y),S(x,y))
=_
_ =S
R S =min(R(x,y),S(x,y))
=_
_ =R
R S= {(x,z): (x,z) X Z, y Y such that (x,y)R anu (y,z)S
Soft Computing
Lecture-9
Max Min Composition:
T- RS
T(x,z)=max(min(R(x,y),S(y,z))))
y Y
RS (1, 1) = max ((min(R (, ), S (, ))
(min(R(,),S(,))
(min(R(,),S(,)))
= max(min(,), min(,), min(,))
= max(,,)
=
RS(1,3)=max(0 0 0)= 0
RS(1,5)=max(0 1 0)= 1
Similarly RS(3,1)= RS(3,3)= RS(3,5)= RS(5,1) RS(5,3)= RS(5,5)= 0
RS form the similar relation matrix {(1,5)}
RS =
Also SR =
(2.) A
= {(x
1
, 0.2), (x
2
, 0.7), (x
3
, 0.4)}
Soft Computing
B
= {(y
1
, 0.5), (y
2
, 0.6)}
R
= A
(x,y)=min(A
(x),B
(y))
y y
R
= A
= x
1
0.2 0.2
x
2
0.5 0.6
x
3
0.4 0.4
X={ x
1
,x
2
,x
3
},Y={y
1
,y
2
},Z={z
1
,z
2
,z
3
}
y
1
y
2
R(x,y)= x
1
0.5 0.1
x
2
0.2 0.9
x
3
0.8 0.6
z
1
z
2
z
3
S(x,y)= y
1
0.6 0.4 0.7
y
2
0.3 0.8 0.9
RS= max ((min (R
(x,y),minS
(y,z)))
RS(x
1
,z
1
)= max(min (R
(x,y),S
(y,z)),min (R
(x,y),S
(y,z))
= max(min (0.5,0.6),min(0.1,0.5))
= max (0.5, 0.1)
= 0.5
RS(x
1
,z
2
)= max(min (R
(x,y),S
(y,z)),min (R
(x,y),S
(y,z))
= max(min (0.5,0.4),min(0.1,0.8))
= max (0.4, 0.1)
= 0.4
Soft Computing
RS= z
1
z
2
z
3
x
1
0.5 0.4 0.5
x
2
0.5 0.8 0.9
x
3
0.6 0.6 0.7
(3.) P= {P1, P2, P
3
, P
4
} plant
D= {D
1
, D2, D3, D4} Disease
S= {S
1
, S2, S3, S4} Symptoms
R
=P D S
=D S
R
= S
=
Soft Computing
Lecture-10
**Find the association of the plants with different symptoms in find
PS S S S using min max composit using min max composit using min max composit using min max composition ion ion ion. .. .
Solution:
P S=
Extension Principle:
It is used for extending crisp domains of mathematical expressions to fuzzy
domains. It generalizes a common point to point mapping of a function f(.) to a
mapping between fuzzy sets.
Suppose a function X to Y and A is a fuzzy set on X defined as.
A= A(x
1
)/x
1
+ A(x
2
)/x
2 + . +
A(x
n
)/x
n
By extension principle image of fuzzy set A under the mapping f(.) can be
expressing as a fuzzy set B.
B = f(A) = A(x
1
)/y
1
+ A(x
2
)/y
2 + . +
A(x
n
)/y
n
Where y
i
= f (x
i
); i = 1,..n
Let A = 0.1/-2 + 0.4/-1 + 0.8/0 + 0.9/1 + 0.3/2
= ((-2,0.1),(-1,0.4),(0,0.8),(1,0.9),(2,0.3))
f(x) = x
3
3
By applying extension principle-
B = 0.1/1 + 0.4/-2 + 0.8/-3 + 0.9/-2 + 0.3/1
= 0.8/-3 + (0.4 0.9) /-2 + (0.1.)
= .8- + .9- + .
Soft Computing
Fuzzy Relation:
Binary Fuzzy relation: - Let X anu Y be two univeise of uiscouise, then
R = {((x,y); R(x,y)) | (x,y) X Y } is abinary relation in X Y
R(x,y) is the two dimensional membership function.
Suppose x = {3, 4, 5}
y = {3, 4, 5, 6, 7}
R: Y is greater than x
R(x,y) = _
y-x
y+x+2
if y >
if y x
Relation Matrix:
R =_
. . . .
.9 . .
. .
_
Max Min Composition:
Let R
1
and R
2
be two fuzzy relation defined on X Y and Y Z, respectively.
The max Min composition of R
1
and R
2
is a fuzzy set defined by.
R
1
R
2
={ [ (x,z),max min(R
1
(x,y) ; R
2
(y,z))] | xX yY zZ ]
R
1 R
2
= max (min(R
1
(x,y) ; R
2
(y,z))
This is came as malri x multiplicand just that anu + aie ieplaceu by anu
Soft Computing
Properties common to max min composition and Binary Relation:-
Associativity: R(S T) = (RS) T)
Bistiibution ovei 0nion : R(S T) = (RS) (RT)
Llliak uistiibutivity ovei 0nion: R(S T) = (RS) (RT)
Nontonicity: S T => RS RT
Nax Piouuct Composition Nax Piouuct Composition Nax Piouuct Composition Nax Piouuct Composition: :: :- -- -
R
1
R
2
(x,z) = max[R
1
(x,y) R
2
(y,z)]
Fuzzy Logic:-
In case of crisp logic the p truth values acquired by propositions or predicates ( is a
statement which is either true or false but not both)equivalent to [0, 1] namely true
or false.
In fuzzy logic, the truth values may be multi valued numerically equivalent to (0-1)
A Fuzzy proposition P
).
In simple form fuzzy propositions are associated with fuzzy set A
for P
.The fuzzy
membership value associated with fuzzy set A
for P
) = A
(x)
Fuzzy connectivitys Uses
. Negative - P
(1- T(P
))
. Disjunction P
max(T(P
) ,T(Q
))
. Conjunction P
min(T(P
) ,T(Q
))
. Implication => P
=> Q
= max ((-T(P
) ,T(Q
))
Implication (=>) represents if then statement
Soft Computing
IF Then Statement as:
IF x is A
TBEN y is B
and is equivalent to R= (A
) (A
Y)
Example: We have two fuzzy set A and B on universe U
1
and U
2
,where U
1
and U
2
both are identical and have integer 1 to 10 as the elements.
U
1
=U
2
= {1, 2, 3, 4,.10}
Let A be approximately 2
= {
0.6
1
+
1
2
+
0.8
3
}
and B be approximately 6
= {
0.8
5
+
1
6
+
0.7
7
}
Find approximately 12
2 6 = {(
0.6
1
+
1
2
+
0.8
3
) (
0.8
5
+
1
6
+
0.7
7
)}
= {
mIn (0.6,0.8)
5
+
mIn (0.6,1)
6
+..+
mIn (0.8,1)
18
+
mIn (0.8,0.7)
21
}
= {
0.6
5
+
0.6
6
+
0.6
7
+
0.8
10
+
1
12
+
0.6
5
+
0.7
14
+
0.8
15
+
0.8
10
+
0.7
21
}
A = {
0.2
1
+
1
2
+
0.7
4
} B= {
0.5
1
+
1
2
}
Find arithmetic product A B.
2) Suppose we have a universe of integer Y= {1, 2,3, 4, 5}.We define the
following linguistic term as a mapping onto Y.
small = {
1
1
+
.8
2
+
.6
3
+
.4
4
+
.2
5
}
large = {
.2
1
+
.4
2
+
.6
3
+
.8
4
+
1
5
}
Modify this two linguistic terms with hedges very small =small
2
Not very small = 1- very small
Not very small and not very very large = (1- very small) (1-(large)
4
)
Intensely small
={
1-2|1-1]
2
1
+
1-2(1-0.8)
2
2
+
1-2|1-0.6]
2
3
+
2(0.4)
2
4
+
2(0.2)
2
5
}
= {
1
1
+
0.92
2
+
0.68
3
+
0.32
4
+
0.08
5
}
Soft Computing
Lecture-11
Linguistic Variables:
According to cognitive scientist, human base their thinking primarily on
conceptual patterns and mental images rather than numerical computation. Also,
human communicate with their own natural language by referring to previous
mental images. Despite of the vagueness and ambiguity in natural language, it is
the most powerful form of conveying information the human poses for any given
problem or situation that requires solving or reasoning. Also human
communication in natural language has very little trouble in basic understanding.
The conventional techniques for system analysis are intrinsically unsuited for
dealing with humanistic system, whose behavior is strongly influenced by human
judgment, perception and emotion. This manifestation may be termed as principle
of incompatibility. As the complexity of the system increases, our ability to make
precise and yet significant statement about its behavior diminished until a threshold
is reached beyond which precision and significance become mutually inclusive
characteristics. This belief let Zadeh to propose the concept of linguistic variables.
A linguistic variable differ from linguistic value in that its value are not number by
words or sometimes in natural language
Definition: A linguistic variable is characterized by quintuple(x, T(x),X,G,M) in
which
x is the name of the variable
T(x) is the term set- the set of its linguistic values or linguistic terms
X is the universe of discourse
G is the Syntactic rule which generate the term T(x)
M is a semantic rule which associates with each linguistic value
A its meaning; M(A), where M(A) denotes fuzzy set in X .
Semantic rule define the MF for each linguistic value in the term set.
Examples:- If age is interpreted as a linguistic variable then its term set
T(age)={ young, not young, not very young,............... middle aged, not middle
aged,............old, not old, very old, more or less old....... not very young.........}
Soft Computing
Each term in T(age) is characterized by a fuzzy set of a universe of discourse X.
If age is interpreted as a numerical variable we may say age=20 but when as
linguistic variable we may say age is young meaning young is assigned to age
Age young
Primary termyoung, old, middle aged
Hedgesvery, more or less, extremely
Linguistic hedges:
In linguistic the fundamental atomic terms/primary terms are often modified with
adjectives or adverb like very, low, slight more or less. These modifiers are termed
as linguistic hedges.
Connectives: as and or either and neither
Concentration and Dilation:
Let A be a linguistic value characterized by a fuzzy set with membership function
A(.). The A
K
is interpreted as modified version of the original linguistic value
expressed as
A
K
= _
|A(x)]
k
x
X
If particular concentration is defined as:
CON (A) = A
2
used for very.
Dilation as DIL (A)=A
0.5
used for more or less.
Soft Computing
Lecture-12
Constructing MFs for Composite Linguistic term:
Let young and old be two linguistic values
oung
() = b(, ,,) =
+
oId
() = b(, ,,) =
+
-
6
More or less old = DIA(old)=(old)
0.5
4
X
x
Not young and not old
Young old
=] ( -
1
1+
x
20
2
)
X
( -
1
1+
x-100
30
6
)x
young but not too young
Young (young)
2
}Assumption that many of too is same as very
=](
1
1+
x
20
2
) ( -
1
1+
x
20
2
)x
extremely old
CON (CON (CON (OLD))) = (((old)
2
)
2
)
2
=old
8
=]
1
1+
x-100
30
6
8
x
x
Soft Computing
Lecture-13
Contrast Intensification:
Another operation that reduces fuzziness of a fuzzy set and is defined as
INT (A) = _
A
2
foi
A
() .
-(-)
2
or .
A
()
It is the value of 0
A
(x) which are above 0.5 and diminishes those which
are below this point.
Orthogonality:-
A term set T=t
1
,..,t
n
of a linguistic variable x on the universe X is orthogonal if it
fulfills the following property.
_
n
j=1
t i
(x)=1,x X
Soft Computing
When t
i
s are convex and normal fuzzy set defined on X and these fuzzy set make
up the term set T.
For MFs in term set to be intuitively reasonable, the
orthogonality requirement has to be followed to some extend.
Fuzzy If the Rules:-
A fuzzy if then rule (also known as fuzzy rule, fuzzy implication, or fuzzy
conditional statement) assumes the form.
If x is A then y is B
Where A and B are linguistic values defined by fuzzy set on universes of discourse
X and Y, if X is A then y is B is abbreviated as AB. The expression describes a
relation between the two variables x and y. One may then say that if-then rule can
be defined as a binary relation R on the product space.
Generally speaking there are two ways to implement fuzzy rule AB.
1. If we interpret AB as A coupled with B then.
R = AB =]
XY
A
(x) -
B
(y) / (x, y)
Where - is a T -noim opeiatoi.
2. If AB is interpreted as A entails B then it can be written as
four different formulas.
Material Implication:
R = AB = A B
Proportional Calculus:
R = AB = A (A B)
External proportional Calculus:
R = AB = (A B) B
Generalized Modus Ponens:
R
(x, y)=sup{c|
A
(x) - c
B
(y) and 0 c 1} ----equation A
Where R = AB and - is a T - noim opeiatoi.
Soft Computing
Based on the above two interpretation and various T-norm and T-conorm
operators, a member of qualified models can be formulated to calculate the fuzzy
relation R = AB.
R can be viewed as a fuzzy set with 2D MF.
R
(x,y)= (
A
(x) ,
B
(y)) = (a, b)
Where a=
A
(x) b=
B
(y) and is a fuzzy implication function.
Performs the job of transforming the membership grade of x in A and y in B
into those of (x,y) in AB.
For the 1
st
interpretation A coupled with B as the meaning of AB,these are four
different fuzzy relation exploring four different T-norm operator
1. Propsed by Mamdani
R
m
=A B = ]
X X Y
A
(x),
B
(y) / (x, y)
or
c
(a, b) = a b (min operator used)
2. Proposed by Larsen
R
p
= A B = ]
X X Y
A
(x)
B
(y) / (x, y)
or
p
= ab (algebraic product used for conjuct)
3.
R
bp
= A B = ]
X X Y
A
(x)
B
(y) / (x, y)
= A B = ] (
X X Y
A
(x) +
B
(y) 1)/ (x, y)
or
bp
= (a, b) = (a+b-1) (bound)
4. R
dp
= A B = ]
X X Y
A
(x) .
B
(y) / (x, y)
or (a, b) =_
a if b =
b if a =
if a, b <
For 2
nd
interpretation A entails B AB
Soft Computing
Lecture-14
Zadehs arithmetic rule.
R
a
= A B = ] ( -
X X Y
A
(x) +
B
(y))/ (x, y)
Or (a, b) = (1-a+b) (bounded sum = (a+b) used for union
operator)
Zadehs max min rule.
R
mm
= A (A B) =]
X X Y
( -
A
(x)) (A(x)
B
(y)) / (x, y)
Or
m
(a, b) = ( -a) (a b) (min for intersection and max for
union)
Boolean fuzzy implication using max for
R
s
= AB=]
X X Y
( -
A
(x))
B
(x) / (x, y)
Or (a, b) = ( -a) b
R = ] (
X X Y
A
(x) <
B
(y)) / (x, y) (Goguens fuzzy implication)
A <
b =_
if a b
b
a
if a > b
It follows e.g. A by using algebraic product for T-norm operator.
Rule Based System:
The most common way to represent human know knowledge is to form it into
natural expression of the type.
IF premise (antecedent) THEN conclusion (consequent) referred IF-THEN rule
based form. This form of knowledge representation is characterized as shallow
knowledge.
This linguistic variables can be naturally represented by fuzzy set and logical
connect vive of these sets.
Soft Computing
Canonical rule Forms:
In general thee general forms exist for any linguistic variables.
Assignment
Conditional Statement
Unconditional Statement
Assignment:
X = large
Temperature = hot
Conditional Statement:
IF the tomato is red THEN the tomato is ripe.
IF x is very hot THEN stop.
IF x is very large THEN y is small ELSE y is not small.
Unconditional Statement:
Stop.
Divide by x.
T is the pressure higher.
The assignment statement restricts the value of a variable to a specific quantity.
The unconditional statement may be thought of as a conditional restriction with
this IF clause condition being the universe of discourse of the input condition
which is always true.
IF any condition THEN true pressure high.
Hence the Rule base can be described using a collection of conditional restriction
statement. These restrictions are usually manifested in terms of vague natural
language words that can be modeled using fuzzy mathematics.
Soft Computing
Lecture-15
Conditional form for a fuzzy rule based system:
Rule-1 IF condition C
1
THEN restriction R
1
.
Rule-2 IF condition C
2
THEN restriction R
2
.
.
.
Rule-r IF condition C
r
THEN restriction R
r
.
Decomposition of Compound R rules:
IF___THEN. IF__THEN__ELS. IF__AND__IN
By using basic properties and operation defined for fuzzy set any compound
rule structure may be reduced to number of simple canonical rules. These rules are
based on natural language representation and models which are themselves based
on fuzzy set and fuzzy logic.
Most common the technique for Decomposition.
Multiple conjunction antecedents
If x is A
1
and A
1
and A
L
THEN y is B
s
Assuming a new fuzzy subset A
s
as A
s
= A
1
A
2
. A
L
Expressed by membership function
A
S
(x) = min|A
(x), A
(x). A
L
(x)]
Baseu on the uefinition of the fuzzy inteisection opeiation, the
compounu iule may be iewiitten as IF A
S
TBEN B
S
.
Multiple disjunction antecedents
IF x is A
1
OR x is A
2
OR x is A
L
THEN y is B
S
.
Ordered be written as
If x is A
S
THEN y is B
S
Where the fuzzy set A
S
defined as A
S
= A
1
A
2
. A
L
A
S
(x) = max|A
(x), A
(x). ]
Soft Computing
Conditions Statement:
(1) IF A
1
THEN (B
1
ELSE B
2
)
may be decomposed as
IF A
1
THEN B
1
OR
IF NOTA
1
THEN B
2
(2) IF A
1
THEN B
1
UNLESS A
2
IF A
1
THEN B
1
OR
IF A
2
THEN NOTB
1
(3) IF A
1
THEN (B
1
ELSE IF A
1
THEN (B
2
))
IF A
1
THEN B
1
OR
IF NOTA
1
AND A
2
THEN B
2
NESTED IF THEN RULE
IF A
1
THEN (IF A
2
THEN (B
1
))
IF A
1
AND A
2
THEN B
1
Aggregation of Fuzzy Rules:
Most of rule based systems include more then one rule, The process of finding the
over all consequent (conclusion) from individual consequents as contributed by
each rule is the rule base is known as aggregation of rules.
Soft Computing
Two simple extreme cases:
a. Conjunction system of Rules:
In this case the rules must be jointly satisfied, the rules are concluded by and
connective.
The aggregation o/p is the fuzzy intersection of individual rules consequent y
i
i =
1,2,..r
Y = y
1
and y
2
and . . . y
r
Y = y
1
y
2
. . .y
r
Which is defined by the membership function:
y(y) = min(y
(y), y
(y), . . . y(y)) y Y
b. Disjunctive system of rules:
Foi the case of a uisjunctive system of iules heie the satisfaction of at least
one iule is iequiieu the iules aie conclusion by the on connective. In this case
the aggiegateu op is founu by the fuzzy union of all inuiviuual iules
contiibutions.
Y = y
1
or y
2
or . . . y
r
Y = y
1
y
2
y
. . .y
r
Which is defined by the membership function:
y(y) = max(y
(y), y
(y), . . . y(y)) y Y
Soft Computing
Lecture-17
Defuzzification:
In many situations foi a system wheie outputs is fuzzy, it is easiei to take a
ciisp output if the output is iepiesenteu as a single scalai quantity. This
conveision of fuzzy set to single ciisp value is calleu uefzzification anu is the
ieveise piocess of fuzzification.
Centroid Method:
Also known as centre of gravity or centre of area method. It obtains the centre of
area occupied by the fuzzy set.
x =
] (x) dx
]x dx
Continuous
x =
_ x1 (x1)
n
=1
_
1
=1
x1
discrete
Here, x no. of element
x
1
element
(xi) its membeiship function.
Compositional Rule of Inference:
y= f(x) regulates the relation between x and y
x = a y = f(a) = b
If this is generalized over an interval (a).
Soft Computing
To find the resulting y=b corresponding to interval a , a cylindrical extension
of a is done and find its intersection I. The projection of I onto y leads to the
interval y=b. If we assume F a fuzzy ielation on X Y .A is a fuzzy set on X. To
finu B on Y we constiuct a cylinuiical extension C(A) with base A. The
inteisection of C(A) anu F foims the analog of the iegion I. By piojection C(A)
F onto Y axis we infei Y as a fuzzy set B on the Y axis.
If A, c(A), B anu F aie NFs of A, C(A), B anu (F) iespectively.
c(A) (x, y) = A(x) by cylinuiical extension.
c(A) F(x, y) =min |c(A) (x, y), F(x, y)]
=min |A(x) , F(x, y)]
By projecting C(A) F on to Y axis we have.
B
(y) =max
x
min [
A
(x) ,
F
(x, y)]
This reduces to max min composition of two relation matrix A(a unary fuzzy
relation) and F (a binary relation)
B = A F
Extension principle is also a special case of compositional rule of inference. Using
compositional rule we can formalize an inference procedure upon a set of fuzzy if
then rules. The inference procedure is generally called approximate reasoning.
Soft Computing
Lecture-18
Fuzzy reasoning:
Premise (fact). X is A.
(Rule) If x is a then y is B
y is B
Modus Ponens. Approximate reasoning or generalized modus ponens.
Computational aspects of Fuzzy Reasoning:
Single rule with single antecedent:
B(y) = A R = A (AB)
= max (min (A(x), min (A(x), B(y)))
= (A(x) (A(x) B(y)))
= (A(x) A(x)) B(y)
= w B(y)
Here we first find degree of match (w) A(x)) A(x)
Then MF of the resulting B is equal to MF of B clipped it by w
Degree of match
Institutively it is said to be the
measure of belief.
The measure of belief is propagated by if then rules. The above figure shows the
graphical interpretation of GMP using Mamdani fuzzy implication and max min
composition.
Soft Computing
Single Rule with multiple Antecedents:
Premise (fact): x is A then y is B
Premise (Rule): If x is A and is y is B then Z is C.
Consequence Z is C
1. A B
2. A B C using Mamdani
R
m
(A, B, C) = (A B) C = ] A(x)) B(y) c(z) (x, y, z)
c'(z) = (A B) (A B C)
c'(z) = (A B) (A B C) Mamdani
= x ,y |(A(x) (B(y)) (A(x) (B(y) c(z))]
= x ,y - x |(A(x) (A(x)) (y(B(y) B(y))] c(z)
= (w w) c(z)
Firing strength or degree of fulfillment.
Soft Computing
Lecture-19
Multiple Rules with multiple antecedents:
The interpretation of many rules is usually taken as the union of the fuzzy relations
corresponding to the fuzzy rules. There fore for a GPM problem.
Premise 1 (fact): x is A then y is B
Premise 2 (Rule 1): If x is A
1
and is y is B
1
then Z is C
1
Premise 3 (Rule 2): If x is A
2
and is y is B
2
then Z is C
2
Consequence: Z is C
R
1
A
1
B
1
C
R
2
A
2
B
2
C
C = (A B) (R R)
= |((A B) R) ((A B) R)]
= C C
Soft Computing
Fuzzy Tolerance and Equivalence Relation:
Fuzzy relation R on a single universe X is a relation from X to X. It is a fuzzy
equivalence relation if all three of the following properties for matrix relation
define it.
Eg: - Reflexivity: R(xi, xi) =
Symmetry: R(xi, xj) = R(xj, xi)
Transitivity: R(xi, xj) = anu R(xj, xk) = R(xi, xk) =
Where min |, ]
Five veitex giaph of equivalence
ielation (Reflexive, symmetiic, tiansitive)
Transitivity short chain, stronger relation.
In general the strength of the link
between two elements must be greater than or equal to the strength of any indirect
chain involving other element.
Tolerance:
A tolerance relation R on a universe X is a relation that exhibits only the properties
of reflexivity and symmetry.
A tolerance relation can be reformed into equivalence
relation by at most (n-1) components n is cardinal
number of sets defining R.
Suppose,
Soft Computing
R
1
=
l
l
l
l
l
.8 . .
.8 . .9
.
. .
. .9 .
1
1
1
1
1
R(x, x) =.8
R(x, x) =.9 .8
R(x, x) =.
R is ieflexive anu symmetiic but not tiansitive.
R
= R R =
l
l
l
l
l
.8 . . .8
.8 . . .9
. . .
. . .
.8 .9 . .
1
1
1
1
1
R
= R =
l
l
l
l
l
.8 . . .8
.8 . . .9
. . . .
. . . .
.8 .9 . .
1
1
1
1
1
Soft Computing
Lecture-20
Value Assignment:
Where do the membership values that are contained in a relation come from. There
are many different way.
1. Cartesian Product
2. Closed form expression
3. Look up table
4. Linguistic rules of knowledge
5. Classification
6. Similarity methods in data manipulation
1. Cartesian product of two or more fuzzy set.
2. Through simple observation of physical system. For a given set of input we
observe a process of yielding a set of outputs. If there is no variation
between specific pair of i/p o/p, the process may be modeled with a crisp
relation, or may be expressed as Y=f(X) where X is a vector of input and Y
is a vector of output. These expressions are termed as closed form of
expression.
3. If some usability exists, membership value on the internal [0, 1] may lead us
to develop a fuzzy relation from a look up table.
4. Fuzzy relation can also be assembled from linguistic knowledge expressed
as if, if then else rules, such knowledge may come from a expert polls.
5. Relation also arises from notion of classification where issues associated
with similarity are essential to determining relationship among patterns or
clusters of data.
6. Similarity methods in data manipulation. Most prevalent method, these are
actually a family of procedures almost as similarity method.
Soft Computing
Cosine Amplitude:
Collection of data samples, n data samples in particular. If these data
samples are collection they form a data array X.
X = {X
1
, X
2
, . , X
n
}
Each element x
i
in the data array X is itself a vector of length in that is
X
i
= {
Xi1
,
Xi2
, .,
Xim
}
Each data example can be thought as a n-dimensional space where each
coordinate require m coordinate for complete description. Each relation r
ij
results from the fair wise computation of two data sample x
1
and data
sample x
j
and the strength of the relationship is given by the membership
value r
ij
= k(xi, xj)
r
ij
is an m n matrix, this will be reflexive and symmetric hence a
tolerance relation.
r
ij =
(_ x
R
x
]R
m
R=1
)
_(_ x
R
2 m
R=1
)_ x
]R
2 m
R=1
i, j = 1,2n
Close implication results that this method is related to the dot product of
cosine function. When two vectors are most similar, their dot product is
usually, when two vectors are at right angle to one another, their dot product
is zero.
Q. Find separate region suffered with earth quake. Survey of damaged building
made for purpose of assembly payout from the insurance companies to building
owners.
Region X
1
X
2
X
3
X
4
X
5
X
i1
Ration
with no
damage
0.3 0.2 0.1 0.7 0.4
X
i2
Ration
with
medium
damage
0.6 0.4 0.6 0.2 0.6
X
i3
Ration
with
serious
damage
0.1 0.4 0.5 0.1 0
Soft Computing
Express the similarity of damage of each of the regions x
r
ij=
| _ x
R
3
R=1
x
R
|
_(_ x
R
2 3
R=1
)(_ x
R
2 3
R=1
)
e.g.
r
12
=
0.3-0.2+0.6-0.4+0.1-0.4
((0.3
2
+0.6
2
+0.1
2
)(0.2
2
+0.4
2
+0.4
2
))
1
2
,
Max min method:
r
ij=
_ mIn (x
R
,x
]R
)
m
R=1
_ max (x
R
,x
]R
)
m
R=1
Where i, j=1, 2, 3 ...n
r
ij=
_ mIn(0.3,0.2),mIn(0.6,0.4),mIn (0.1,0.4)
3
R=1
_ max(0.3,0.2) max(0.6,0.4),max(0.1,0.4)
3
R=1
=
0.2+0.4+0.1
0.3+0.6+0.4
* The membership function of R
is given by
(x),
B
(y)) , -
A
(x))
IF THEN ELSE (Compound Implication)
If x is A
THEN y is B
ELSE y is C
=(A
)((A
)
The membership function is given by
R
(x, y) = max (min(
A
(x),
B
(y)) , min ( -
A
(x)),
C
(y))
Soft Computing
Lecture-21
Fuzzy Inference: Also referred to as approximate reasoning refers to
computational procedures used for evaluating linguistic description.
Two important inferring procedures are
1. Generalized Modus Ponens(GMP)
2. Generalized Modus Tollens(GMT)
GMP is formally stated as
1. Rule IF x is A
THEN y is B
2. Fact x is A
3. y is B
Here A
,B
,A
,B
with R
= A
i
o R
(x, y)
Where A
=max (min (
A
|(x),
R
(x, y))
Where
A
|(x) is membership function of A
,
R
(x, y) is membership function of R
GMT:
1. Rule IF x is A
THEN y is B
2. Fact y is B
3. x is A
A
i
= B
i
o R
(x, y)
In terms of membership function
i
(x)=max (min (
B
|(x),
R
(x, y))
Soft Computing
Fact x is A
Rule if x is A then y is B
Consequence y is B
B=Ao R = A o (AB)
Degree of compatibility:
Compare the known facts to find the degree of compatibility with respect to each
antecedent MF
Firing strength:
Combine degrees of compatibility with respect antecedent MF in a rule using fuzzy
AND or OR operators to form a firing strength.
Qualified (induced) consequent MF:
Apply the firing strength to the consequent MF of a rule to generate qualified
consequent MF.
Overall output MF:
Aggregate all qualified consequent MFs to obtain an overall output MF.
R
= max (
), (
) Gives
Q.3 Apply Modus Ponens rule to deduce Rotation is quite slow given
1. If the temperature is high then the rotation is slow.
2. The temperature is very high.
Let B
(high),VB
(very high),S
(slow), and Q
= {(70,1)(80,1)(90,0.3)}
VB
Then Y is S
=>R
(x,y) = (B
) (B
)
2. X is v
H
If X is H Then Y is S
X isVB
Y is QS
QS
= VB
(X,Y)
R
(X,Y) = (B
) (B
Y)
Soft Computing
Lecture-22
*Find the appropriate voltage for this temperature using max-product
composition.
In computer system there is a relationship between CPU board temperatures and
power supplying voltage. Let us consider the following relation: If the temperature
(in degree Fahrenheit) is high then power supply in volts will drop or become low.
Let
A
= temperature is high
B
= voltage is low
A B = If the temperature is high then the voltage will be low.
The following membership function might be appropriate for these two variables
A= {
.1
50
+
.5
75
+
.7
100
+
.9
125
+
1
150
}
B= {
1
40
+
.8
4.25
+
.5
4.5
+
.2
4.75
+
0
5.0
}
a. Find A B
b. Suppose we consider another temperature A
) =0.8
Q
) =0.65
I. Marry is not efficient T(P
)= (1- T(P
))
=(1 0.8)
= 0.2
II. Marry is efficient and so is Ram => P
T(P
) = min(T(P
),T(Q
))
= min(0.8, 0.65)
= 0.65
III. Either marry or ram is efficient => P
T(P
) = max(T(P
),T(Q
))
= max(0.8, 0.65)
= 0.8
Soft Computing
IV. If Marry is efficient than so is Ram
P
=> Q
= P
=>Q
T(P
=>Q
) = max((1- T(P
), T(Q
))
= max ((1 0.8), 0.65)
=max (0.2, 0.65)
=0.65
2. X = {a, b, c ,d} Y = {1 , 2 , 3 ,4}
And A
then y is B
= (A
) (A
)
where (R
) = _
. . . .
. . . .
_ A
= (A
) (A
)
R
= _
. .8 .8 .
. . . .
. . .8
_
Soft Computing
2.) If x is A
then y is B
Else y is C
=(A
) (A
C)
when (R
= _
. .8
. . .
. . .
_
Soft Computing
Lecture-23
Fuzzy Inference System:
The fuzzy inference system is a popular computing framework based on the
concepts of Fuzzy set theory, fuzzy if then else rules and fuzzy reasoning.
It has found application in a wide variety of fields Because of its multidisciplinary
nature fuzzy inference system is known by many other means such as fuzzy rule
based system, fuzzy expert system, fuzzy model, fuzzy associative memory, fuzzy
logic controller and simply fuzzy system.
Fuzzy inference system consists of:
1. A rule base which contains selection of fuzzy rules.
2. Data base which define the membership function used in fuzzy rules.
3. A reasoning mechanism which performs the inference procedure upon the
rules and given facts to derive a reasonable outputs or conclusion.
Fuzzy inference system can take either fuzzy input or crisp input, but the output
it produces are almost always fuzzy set.
But when it is necessary to have a crisp output especially when the fuzzy
inference system is used as a controller, defuzzification is used to extract crisp
value that best represents a fuzzy set.
With crisp input and output the fuzzy inference system implements a non linear
mapping from its input space to output space. This mapping is accomplished by
a number of fuzzy if then else rules, each of which describes the local behavior
of the mapping.
There are different types of fuzzy inference system or fuzzy controllers. In general,
the principal design elements in a general fuzzy logic control system are as
follows:
1. Fuzzification strategies and interpretation of a fuzzification operator or
fuzzifier.
Soft Computing
2. Knowledgebase:-
a. Discretization / normalization of the universe of discourse.
b. Fuzzy partition of input and output spaces.
c. Completeness of the partition.
d. Choice of membership function of a primary fuzzy set.
3. Rule Base:-
a. Choice of process state (input) variable and control (output)
variable.
b. Source of derivation of fuzzy control rules.
c. Consistency interacting and completeness of fuzzy control rules.
4. Decision making:-
a. Definition of fuzzy implication.
b. Interpretation of sentena connective and.
c. Interpretation of sentena connective OR.
d. Inference mechanism.
5. Defuzzification strategies and interpretation of different defuzzification
operator.
Soft Computing
Lecture-24
MAMDANI FUZZY MODELS:-
Mamdani fuzzy inference system was proposed as the first attempt to control a
system engine and boiler combination by a set of linguistic control rules obtained
from experience human operators.
Two rule Mamdani inference system:
Two crisp input x, y overall output z.
Let min and max be adapted as T-norm (fuzzy intersection) and T-conorm (fuzzy
union) operator.
For finding relation Max min composition is used.
Soft Computing
For a two input system:
1. The inputs of the system are crisp values and we use max min inference
method.
2. The inputs to the system are crisp value and we use max product inference
method.
3. Inputs to the system are represented by fuzzy set and we use max min
inference method.
4. Inputs to the system are represented by fuzzy set and we use max product
method.
Case 1: R input x
1
and x
2
are crisp value.
Rule based system.
If x
1
is A
1
k
and x
2
is A
2
k
Then y
k
is B
k
Membership function for input x
1
and x
2
will be described by.
(x) = (x input(i)) =
x = input(i)
otheiwise
(x) = (x input(j)) =
x = input(j)
otheiwise
B
k
(y) = max(min|A
x
(x) Ak(xk)] k= ,
Soft Computing
Case 2: Max Product (co-relation product)
B(y) = max(A input(i) A
k
input(j))
Graph
Case 3: Input (i) and input (j) are fuzzy variables described by fuzzy
membership function. The aggregated output by Mamdani implication will
be given by:
B
k
(y) = max|min{max|A
k
(x) (x)], max|A
k
(x) (x)]]]
Case 4: Input (i) and input (j) are fuzzy variable function and the inference
method is a correlation product method.
B(y) = max|max|A
k
(x) (x)] . .. . max|A
k
(x) (x)]]
Soft Computing
In mechanics the energy of a moving body is called kinetic energy. If an object of
mass m (kilograms) is moving with a velocity v (miles/second) then the kinetic
energy K(in joules) is given by the equation K =
1
2
mv
2
.Model the mass and the
velocity as input to a system and the energy as output then observe the system for a
while and deduce the following two disjunctive rules by inference.
Rule 1: If x
1
is A
1
1
(small mass) and x
2
Is A
2
1
(high velocity) THEN
Y is B'(medium energy)
Rule 2: If x
1
is A
1
2
(large mass) OR
x
2
A
2
2
(medium velocity)
THEN Y is B
2
(high energy)\
Soft Computing
Lecture-25
Defuzzification Methods:-
Defuzzification is the conversion of a fuzzy quantity to a precise quantity. The
output of a fuzzy process can be the logical union of two or more membership
function defined on the universe of discourse of the output variables.
1. Max membership principle:
Also known a height method.
c(z
*
) c(z) foi all z Z
2. Centroid method:
Also known as centre of area or centre of gravity method.
Z
*
=
]c(z)z dz
]c(z) dz
] uenote algebiaic integiation
Z
*
=
_c(z)Z
_c(z)
aiea
3. Weighted average method:
This method is only valid for symmetrical outputs membership functions.
Z
*
=
_c(Z
)Z
_c(Z
)
_ uenotes algebiic sum
The weighted average method is formed by weighting each membership
function in the output by its respective maximum membership value.
Soft Computing
Example:
Z
*
=
a 0.5+b 0.9
0.5+0.9
Since it is restricted to symmetrical membership function, the value of a and b are
the max of their respective shapes.
4. Mean Max membership:
This method is closely related to first method, except that the location of the
maximum membership can be non unique. i.e. maximum membership can be
a plateau rather than a single point.
Z
*
=
a+b
2
5. Center of sum:
This is faster than many defuzzification methods that are used. This process
involves the algebraic sum of individual outputs fuzzy set say c
1
and c
2
instead of their union. One drawback is that in this method intersecting areas
are added twice.
Z
*
=
] Z_ c
R
(Z)dz
n
R=1
Z
]
_ c
R
(Z)dz
n
R=1
Z
This method is similar to the weight average method except is centre of sum
method the weight are the areas of two respective membership function
whereas in the weighted average method the weight are individual
membership values.
Soft Computing
6. Bisector 0f area:
A vertical line portioning the region between z= ,y=0 and z= , y=
p
(z)
into two region with same area.Z
BOA
satisfies.
Z is the universe of discourse
= min {z | z Z}
= max {z | z Z}
7. Smallest of maximum:
Z
SOM
is the minimum in terms of magnitude of the maximum Z.
8. Largest of maximum:
Z
LOM
is the maximum in terms of magnitude, of the maximum Z.
This (7,8) are not often used.
Soft Computing
Lecture-26
Sugeno Fuzzy Model:-
AKA: TSK fuzzy model by Takagi, Sugeno & Kang.
Is an effort to develop a systematic approach to generating fuzzy rules from a
given input output data set. A typical fuzzy rule in a Sugeno Fuzzy model. Fuzzy
model has the form.
If x is A and y is B the Z=f(x, y)
A and B are the fuzzy set in the antecedent.
Z = f(x, y) is a crisp function in the consequent.
w
1
z
1
= p
1
x+ q
1
y + z
1
w
2
z
2
= p
2
x+ q
2
y + z
2
Z =
w
1
z
1
+ w
2
z
2
w
1
+ w
2
C
''
= (A' B) ( A B C )
= xi | A'(x) B(y)] |A(x) B(y) c(z)]
= x,y { | A'(x) B'(y) A(x) B(y)]] c(z)
= {x | A'(x) A(x)]] {y |B'(y) B(y)]] c(z)
= ( w w) c(z)
Soft Computing
Here, w={x | A'(x) A(x)]] anu w = {y |B'(y) B(y)]]
Let input mass = 0.35kg and input velocity = 55 m/s
input mass be = approximate 0.35kg
input velocity = approximate 55 m/s
Soft Computing
Two input single output Sugeno fuzzy model:
E.g., If x is small y is small then z = -x + y + 1
If x is small y is large then z = - y + 3
If x is large y is small then z = - x 1
If x is large y is small then z = x + y + 2
Z=x
2
y + 1
Z = x
2
+ y
+ 1
Soft Computing
Lecture-27
TSUKAMOTO Model:
M.F. As a result the inferred output of each rule is defined as a crisp value induced
by the rules firing strength. Overall output is the weighted average of each rules
output. No time consuming defuzzification process required. Not used often since
it is not transparent as either Mamdani or Sugeno.
Let R
= { [ (x, y), R
(x, y) | (x, y) X Y ]
y y y y y y
R
()
(x)!
x . . . .8 .8
x . . .8 .8 .
x . .8 . .8 . .
R
()
(y)
. .8 .8
Total Projection 1
Soft Computing
Lecture-28
Genetic Algorithm:-
A genetic algorithm (GA) is a procedure that tries to mimic the genetic evolution a
consecutive generations in a population to adapt to their environment. The
adaptation process is mainly applied through genetic inheritance from parents to
children and through survival of the fittest. Therefore, GA is a population-based
search methodology. Some pioneering works traced back to the middle of 1960s
preceded the main presentation of the GAs of Holland in 1975. However, GAs
were limitedly applied until their multipurpose presentation of Goldberg 1989 in
search, optimization, design and machine learning areas. Nowadays, GAs are
considered to be the most widely known and applicable type of metaheuristics.
GA starts with an initial population whose elements are called chromosomes. The
chromosome consists of a fixed number of variables which are called genes. In
order to evaluates and rank chromosomes in a population, a fitness function based
on the objective function should be defined. Three operators must be specified to
construct the complete structure of the GA procedure; selection, crossover and
mutation operators. The selection operator cares with selecting an intermediate
population from the current one in order to be used by the other operators;
crossover and mutation. In this selection process, chromosomes with higher fitness
function values have a greater chance to be chosen than those with lower fitness
function values. Pairs of parents in the intermediate population of the current
generation are probabilistically chosen to be mated in order to reproduce the
offspring. In order to increase the variability structure, the mutation operator is
applied to alter one or more genes of a probabilistically chosen chromosome.
Finally, another type of selection mechanism is applied to copy the survival
members from the current generation to the next one.
GA operators; selection, crossover and mutation have been extensively studied.
Many effective setting of these operators have been proposed to fit a wide variety
of problems. More details about GA elements are discussed below before stating a
standard GA.
Soft Computing
Fitness Function
Fitness function is a designed function that measures the goodness of a solution. It
should be designed in the way that better solutions will have a higher fitness
function value than worse solutions. The fitness function plays a major role in the
selection process.
Coding
Coding in GA is the form in which chromosomes and genes are expressed. There
are mainly two types of coding; binary and real. The binary coding was presented
in the GA original presentation in which the chromosome is expressed as a binary
string. Therefore, the search space of the considered problem is mapped into a
space of binary strings through a coder mapping. Then, after reproducing an
offspring, a decoder mapping is applied to bring them back to their real form in
order to compute their fitness function values. Actually, many researchers still
believe that the binary coding is the ideal. However, the real coding is more
applicable and easy in programming. Moreover, it seems that the real coding fits
the continuous optimization problems better than the binary coding.
Selection
Consider a population P, selection operator selects a set P P of the
chromosomes that will be given the chance to be mated and mutated. The size of P
is the same as that of P but more fit chromosomes in P are chosen with higher
probability to be included in P. Therefore, the most fit chromosomes in P may be
represented by more than one copy in P and the least fit chromosomes in P may be
not represented at all in P. Consider the population P = {x
1
, x
2
, . . . , x
N
}. The
difference between selection operators lies in the way of computing the probability
of including a copy of chromosome x
i
P into the set P, which is denoted by
p
s
(x
i
). Using these probabilities, the population is mapped onto a roulette wheel,
where each chromosome x
i
is represented by a space that proportionally
corresponds to p
s
(x
i
). Chromosomes in the set P are chosen by repeatedly spinning
the roulette wheel until all positions in P are filled.
Soft Computing
Lecture-29
Crossover and Mutation
Crossover operator aims to interchange the information and genes between
chromosomes. Therefore, crossover operator combines two or more parents to
reproduce new children, then, one of these children may hopefully collect all good
features that exist in his parents. Crossover operator is not typically applied for all
parents but it is applied with probability pc which is normally set equal to 0.6.
Actually, crossover operator plays a major role in GA, so defining a proper
crossover operator is highly needed in order to achieve a better performance of
GA. Different types of crossover operators have been studied. Mutation operator
alters one or more gene in a chromosome. Mutation operator aims to achieve some
stochastic variability of GA in order to get a quicker convergence. The probability
pm of applying the mutation operator is usually set to be small, normally 0.01.
Standard Genetic Algorithm:
1. Initialization. Generate an initial population P
0
. Set the crossover
andmutation probabilities p
c
(0, 1) and p
m
(0, 1), respectively. Set the
generation counter t := 1.
2. Selection. Evaluate the fitness function F at all chromosomes in P
t
. Select an
intermediate population P
t
from the current population P
t
.
3. Crossover. Associate a random number from (0, 1) with each chromosome
in P
t
and add this chromosome to the parents pool set SP
t
if the associated
number is less than p
c
. Repeat the following Steps 3.1 and 3.2 until all
parents in SP
t
are mated:
3.1. Choose two parents p
1
and p
2
from SP
t
. Mate p
1
and p
2
to reproduce
children c
1
and c
2
.
Soft Computing
3.2. Update the children pool set SC
t
through SC
t
:= SC
t
{c
1
, c
2
} and
update SP
t
through SP
t
:= SP
t
{p
1
, p
2
}.
4. Mutation. Associate a random number from (0, 1) with each gene in each
chromosome in P
t
, mutate this gene if the associated number is less than p
m
,
and add the mutated chromosome only to the children pool set SC
t
.
5. Stopping Conditions. If stopping conditions are satisfied, then terminate.
Otherwise, select the next generation P
t+1
from P
t
SC
t
. Set SC
t
to be empty,
set t := t + 1, and go to Step 2.
Soft Computing
Lecture-30
Neural Network:
Work on artificial Neural N/W is commonly referred as Neural Networks
Motivation: Human brain computes in an entirely different fashion from the
computational computer .Though, it is highly complex, nonlinear, and has parallel
information processing system. It has the capability to organize its structural
constituent known as neurons so as to perform certain computation (e.g. Pattern
reorganization, perception e.t.c) and many times faster than the fastest digital
computer in existence today.
E.g human vision:-The visual system provides the representation of the
environment around us and more important information we need to interact with
the environment or recognizing a familiar face in an unfamiliar experience.
A brain has a great structure and the ability to build up its own rules through
what we usually refer as experience.
A developing neuron is synonymous with a plastic brain. Plasticity permits
developing neurons to (neuron system) adopt to is surrounding environment. In
general, also plasticity is an essential property for functioning of any machine for
information processing.
Neural Network is a machine that is designed to model the way in which
human brain performs a particular task or function
This N/W can be realized either by using electronic component or is
simulated in software on a digital computer.
An important class of neural network is one that performs useful computation
through a process of learning.
Neural network viewed as an adaptive machine can be defined as:
A Neural Network is a massively parallel distributed processor made up of simple
processing units, which has a natural propensity for storing experimental
knowledge and making it available for use. It resembles the brain in two respects.
Soft Computing
1. Knowledge is acquired by the N/W from its environment through a learning
process.
2. Inter neuron connection strengths, known as synaptic weights, are used to
store the acquired knowledge.
The procedure used to perform the learning process is called learning algorithm,
the function of which is to modify the synaptic weight of the network in an
orderly fashion to attain a desired design objective.
Benefits of Neural Network:
Neural networks derive its computing power through:
1. Massively parallel distributed structure.
2. Its ability to learn and there for generalize.
Generalization refers to that the neural network can produce reasonable output for
inputs not encountered during training (learning).
**N.N can not provide solution by working individually. Rather it can be
integrated into a system engineering process.
Soft Computing
Lecture-31
NN offers the following useful properties:
1. Nonlinearity:-A N can be linear as well as non linear.
2. Input-Output Mapping:- NN can be trained using sample data or task
example. Each example consists of a unique input signal and a
corresponding classical response. The network is trained by adjusting the
weights to minimize difference between classical o/p and actual o/p.
3. Adaptivity:- Neural network have a built in capability to adopt their
synaptic weights to changes in the surrounding environment.
In particular a neural network trained to operate in a specific environment
can be easily retrained to deal with minor changes in the operating
environmental condition. Also if NN is meant to function is a non stationary
environment, it can be designed to change its synaptic weight in real time. This
enables to make it a useful tool in adaptive pattern classification, adaptive
signal processing, and adaptive control.
** To realize the full benefit of adaptivity, the principle time constant of the
system should be long enough for the system to ignore spurious disturbances and
yet short enough to respond to meaning full changes in the environment.
4. Evidential response:- In context to pattern classification, a neural
network can be designed to provide information not only about which
pattern to select but also about the confidence in the discussion made:
The latter information is used to reject ambiguous patterns.
5. Contextual Information:- Knowledge is represented by the very
structure and actuation state of a neural network. Every neuron in the
network is potentially affected by the global activity of all other neurons
in the network.
Soft Computing
6. Fault Tolerance:- A neural network, implemented in hardware form has
the potential to be inherently fault tolerant or capable of about
computation, in the sense that is performance degrades gradually under
adverse operating conditions. Thus in principle a neural network exhibits
a graceful degradation in performance rather than catastrophic failure.
7. VLSI: Implementation:- The massively parallel nature of a neural
network makes it potentially fast for the computation of certain task. This
same feature makes a neural network will suited for implementation
using very large scale integrated technology.
8. Uniformity of Analysis & Design:- A neural n/w enjoys universality as
information processes. I.e. like same notion as used in all domains
involving application of NN.
9. Neurobiological Analogy:- The design of a neural network is motivated
by analogy with the brain, which is a living proof that fault tolerant
parallel processing is not only physically possible but also fast and
powerful.
Soft Computing
Lecture-32
Neural Networks:
Brain contains about 10
10
basic units called neurons. A neuron is a small cell that
receives electro-chemical signals from its various sources and in turn responds by
transmitting electrical impulses to other neurons.
Some neurons perform input operation referred to as afferent cell; some perform
output operation referred to as efferent cells, the remaining form a part of
interconnected network of neurons which are responsible for signal transformation
and storage of information.
Structure of neuron: Graph
Dendrites: Behave as input channels, i.e. all inputs from other neurons arrive
through the dendrites.
Axiom: Is electrically active and serves as an output channel. There are the non
linear threshold devices which produce a voltage pulse called Action Potential. It
the cumulative inputs received by the soma raise the interval electric potential of
the cell neuron as Membrane potential, then the neuron fires by propagating the
action potential shown the axiom to either or inhibit other neurons.
Synapse or Synaptic Junction:
The axiom terminates in a specialized contact called synapse or synaptic function
that connects axiom to dendrites links of other neurons.
Soft Computing
This synaptic function which is a very minute gap at the end of the dendrite
link contacts a neuron transmitter fluid.
The size of the synaptic junction or synapses is believed to be related to learning.
Thus, synapses with large area are thought to be exhibitory while those with small
area are believed to be inhibitory.
Model of A Artificial Neuron:
Human brain is a highly interconnected network of simple processing elements
called neurons. The behavior of a neuron can be captured by a simple model
termed as artificial neuron.
In artificial neurons acceleration and retardation of modeled by weights. An
efficient synapse which transmits a stronger signal will have a corresponding larger
weight.
I = w
1x
1 + w
2
x
2
+ .+ w
n
x
n
= _ w
I
n
I=1
x
I
To generate the final output, the sum is passed on to a non linear filled called
Activation function, tranfi function, ar squash function.
Y = (I)
Soft Computing
Lecture-33
Commonly used activation function:
1. Thresholding function:- Sum is compared with a threshold value . If the
value of I is greater than , then the output is else it is .
Y = | _ w
I
n
I=1
x
I
- ]
Where, is the step function known as Beaviuue function is anu is such
the:
Y = ( I - ) = if I >
= I
y = (x)input output
2. Signum function:- (I) = + I >
= -1 I
Soft Computing
3. Sigmoidal function:- This function is a continuous function that varies
gradually between the asymptotic values 0 and 1 or -1 and +1 given by.
4. Piecewise-Linear Function:- For the piecewise linear function described.
(v) =
`
1
1
1
1
v +
1
2
v +
1
2
v > -
1
2
v -
1
2
(I) =
1
1+ c
-o
Slope parameter which adjust the abruptness of the function as it changes
between the tied asymptotic value.
Neural network Architecture:
It is defined as data processing systems consisting of large number of simple
highly interconnected processing elements.
Soft Computing
Lecture-34
Network Properties:
Generally an artificial neural network can be represented by using a directed graph.
The topology of a neural network refers to its framework as well is as its
interconnection scheme. The frame of layers and the number of nodes per layer
include:
1. The input layer: - The nodes in it are called inputs merely transmit
signal.
2. Hidden layer:- Performs useful intermediary computation.
3. Output layer:- Encodes possible values.
Human brain:
: Block diagram representation of nervous system:
Central to system is the brain represented by neural network which continually
receives information, process it and make appropriate decision.
Forward transmission takes the information bearing signal through the
system.
Soft Computing
Backward transmission provides feedback to system.
Neuron -Structural constituents of brain.
Typically a neuron is 5 to 6 order of magnitude slower than silicon logic gates.
Event in silicon chip happen is 10
-9
s. where as in neuron 10
-3
s. The brain makes
up for the relatively slow s rate of operation of a neuron by having a truly
staggering number of neuron with massive interconnection between them.
Energy Efficiency: of brain is approximately 10
-16
/joules per operation per
sec. where as the corresponding value for the best computer in use today is about
10
-6
j/ operation per second.
A Neuron is an information processing unit that is fundamental to the operation
of a neural network
Three basic elements or connecting link, each of which is characterized a
weight or strength of its own.
1. A set of synapses or connecting link, each of which is characterized a
weight or strength of its own.
2. A adder:- for summing up input signals.
3. An activation function:- for limitation the output of a neuron also
referred as equalizing function.
Soft Computing
Lecture-35
Structural Organization of levels in the brain:
Soft Computing
Neural Network Viewed as directed Graph:
A neural network is a directed graph consisting of nodes with interconnectivity
synaptic and activation links and is characterized by four properties.
1. Each neuron is represented by a set of linear synaptic links, an externally
applied bias, and possibly non linear activation links. The bias is represented
by a synaptic link connected to an input find at +1.
2. The synaptic links of a neuron weight thus respective input signals.
3. The weight sum of the input signal defines the induced local field of the
neuron in question.
4. The activation link squashes the individual local field of the neuron to
produce an output.
Soft Computing
Lecture-36
Feedback:-Feed back is said to exist in a dynamic system whenever the output
of an element in the system influences in part the input applied to that particular
element, there by giving rise to one or more closed paths for the transmission of
signal around the system.
x'
j
(x) = x
j
(x) + By
k
(x)
y
kx
= A x'
j
(x)
= A (x
j
(x) + By
k
(x))
Y
k
(x) = A(x'
j
(x))
= A(x
j
(x) +B [Y
k
(x)])
Y
k
(x) AB ( Y
k
(x)) = A x
j
(x)
Y
k
(x) =
A
1-AB
x
j
(x)
Here,
A
1-AB
Closed loop operator.
AB Open loop operator.
The fixed weight B can be used as limit delay operator, Z
-1
whose o/p is delayed
w.r.t input by one time unit.
Then
A
1-AB
=
w
1- z
-1
= w( 1- wz
-1
)
-1
A
1-AB
= w _ w
I
z
-c
I=0
using Binomial Theorom.
y
k
(x) = w _ w
c
z
-I
I=0
[ x
j
(x) ]
z
-I
[x
j
(x) ] = x
j
(x - l) by definition of z
-`1
Where x
j
(x - l ) is a sample of input signal delayed by l times unit.
y
k
(x) = _ w
c+1
c=0
x
j
(x - e)
The dynamic behavior of an neural network is controlled by weight w
Soft Computing
1. | w | < 1
Output signal is extremely convergent, system is stable.
W < 1 corresponds to system with infinite memory; i.e lte.
2. | w | 1
Output signal is divergent, system is unstable.
3. | w | = 1
Signal divergence is linear.
Network Architecture:-
The manner in which the neurons of a neural network are structured is intimately
linked with the learning algorithm used to train the network.
Three Fundamental Classes of network architecture:-
1. Single layered Fed forward network:(No computation required at input )
: Input Layer: :o/p layer:
Source node neuron
Soft Computing
Lecture-37
2. Multi layer Feedback Networks:-
The second class of feed forward network distinguishes itself by the
presence of one or more class of hidden layers, whose computation nodes
are called hidden neurons or hidden unit: The function of hidden neuron
is to intervene between the external input and network output is in some
useful manner.
Presence of one or more hidden layer enables to extract higher order
statistic. This is particularly valuable when the size of the input layer is
large.
Network acquires a global perspective despite its local connectivity due to
extra set of synaptic connection and extra dimension of neural interaction.
:Input Layer: :layer of hiddenneuron: :layer of o/p
neuron :
m h
1
h
2
9 n/w
m input
h
1
no of neuron in first hidden layer.
Soft Computing
h
2
no of neuron in 2nd hidden layer.
9 no of output number.
3. Recurrent Network:-
It has at least one Feed back loop.
Every node performs a specific function and the nodes are associated with certain
parameters, if the parameters change the overall behavior also changes.Hence the
node function depends on parameters value.
If node parameters set is not empty there are p represented by a square and if the
node parameter is empty and if performs a specific function then it is represented
by a click
xf y x f y
a
Based on the type of connection adaptive networks can be classified as
1. Feed forward network: all connection in one direction.
a. Single layer
b. Multi layer
c.
2. Recurrent network: there are feedback connections on loops.
Types of connections:
1. Inter layer connection: a connection between nodes of adjacent layer.
2. Intra layer connection: a connection between nodes within the sume
layers.
3. Supra layer connection: a connection between nodes in distance
(nonadjacent) layer.
4. High order connection: a connection that contain input from more than
one layer.
Soft Computing
Perceptron:
The perceptron is a computational model of the retina of the eye and hence is
named as a perceptron.
The n/w comprises of three units
Photo electrodes are randomly connected to associated units. The A unit used to
comprise of features on predicates. The predicates examines the o/p of the s
units for specific features of the image. Respose R unit comprises or pattern
recogniser or preceptors.
o/p y
1
=f(net j)=1 if net j>0
=0 otherwise
Where netj=_ x
I
w
Ij
n
I=1
Or o/p column the net j also
The training algorithm for perceptron is of supervised learning where units are
adjusted to minimize error whenever the computed o/p does not match the
target o/p
Soft Computing
Lecture-38
Basis learning algorithm for perceptron:
1. If the o/p is correct no adjustment of units
W
ij
(k+1)
=W
ij
(k)
2. if the o/p is 1 but should have been 0 then the weights are decreased on
the active i/p links
W
ij
(k+1)
=W
ij
(k)
-.xi
3. if the o/p is 0 but chould have been 1 then the weights are increased on
the action links
W
ij
(k+1)
=W
ij
(k)
+.xi
Where W
ij
(k+1)
new adjusted weight
W
ij
(k)
old weight
learning rate
small slow learning
large fast learning
constant learning algorithm is termed as fixed increment algorithm.
The value of the learning rate can be constant throughout the training or it can
be varying quantity proportional to error. Varying learning rate proportional to
error leads to faster convergence but can cause unstable learning.
w
i
=.t
i
.x
i
also used same as , the learning rate.
Soft Computing
Perceptron and linear separable tasks:
Perceptron cannot handle task which are not (linear separable)i.e. set of point
in a 2D can be separated by a straight line.
Perceptron cannot find weights for classification type of problem that are not
linearly separable.
e.g. an example is XOR problem
XOR Problem.
X Y O/p
0 0 0
1 1 0 even parity.
0 1 1
1 0 1 odd parity.
Problem is to classify input as even and odd parity.
Soft Computing
This is impossible since the perceptron is unable to find a line which can
separate odd i/p path and even i/p pattern
Why it cannot be find?
Consider an example:
x
o
= 1
w
0
x
1
w
1
w
2
x
2
=w
0
+w
1
x
1
+w
2
x
2
(represents eqn of a line)
Soft Computing
x
2
st line acts as a decision boundary
class c
1
class c
2
x
1
Adaline Network:
The (Adapline Linear Neural Network)
only one i/p neuron,
o/p value are bipolar.( -1 or +1)
inputs could be binary, bipolar, real value.
If weights sum of input is greater than zero then o/p is 1 otherwise -1.
The supervised learning algorithm adopted by n/w is known as lean mean
square or delta rule.
W
1
new
= W
1
old
+ (t-y)x
i
t target o/p
Similar to perceptron learning algorithm
x
1
x
2
y output
x
n
Thresholding function
Soft Computing
Lecture-39
Madaline Network:
Created by combining a sum of adalines.
Use of multiple adaline help conquer the problem of non linear separability
Madaline with 2 limit exhibit the capability to solve XOR problem.
x
0
=1
w
0
Y
x
1
w
2
Z
x
2
output
x
0
=1 Z
Y
x
1
w
1
w
2
x
2
if Z and Z are same o/p is +1
if different o/p is -1
(-1,+1) (+1,-1)
(+)ve o/p
(-)ve o/p
(-1,-1) (+1,-1)
Represents even and odd pierly
AND
Soft Computing
Advantage of using delta rule:
1. Simplicity
2. Distributed learning: learning is not reliant on central control of the network;
it can be performed locally at each node level
3. Online learning(or pattern by putting learning): weights are adjusted after
presentation of each pattern.
4.
Summary of Perceptron convergence Algorithm:
Variables and Parameters
X(n)=(m+1)-by- 1 input vector
=[+1,x
1
(n),x
2
(n),...................x
m
(n)]
T
W(n)=(m+1)-by- 1 weight vector
=[b(n),w
1
(n),w
2
(n),...................w
m
(n)]
bn = bias
y(n)= actual response; d(n)= desired response
= learning rate parameter, a positive constant less than unity.
1. Initialization: set w(0)=0: then perform the following computation for
time stop n=1,2,................
2. Activation: at time step n, activate the perceptron by applying
continuous-valued input vector x(n) and desired response d(n)
3. Computation of actual response:- Compute the actual response of
perceptron.
Y(n)=sqn[ w
T
(n) x(n)]
4. Adaptation of weights: update the weight vector of the perceptron
W(n+1)=w(n) + [d(n)-y(n)]x(n)
Soft Computing
Where d(n)=_
+ if x(n)belong to class c
1
- if x(n)belong to class c
2
5. Continuation: increment time step by 1 and go to step 2
Exclusive OR Problem
X Y class
0 0 0
0 1 1
1 0 1 not linear operable
1 1 0
0w
1
+ 0w
2
+ w
0
0 w
0
0
0w
1
+1w
2
+ w
0
>0 w
0
> - w
2
1w +w + w> w> - w
w +w + w w -w w
Soft Computing
Lecture-40
Features of ANN:
1> They learn by example,
2> They constitute a distributed , associative memory
3> They are fault tolerant.
4> They are capable of pattern recognition.
The capability learning by example utilizing example taken from data and
organise the information into useful form. This form constitute a mode that
represents the relationship between I/p and o/p variables.
Associative Memory:-
An associative memory can be thought of as a mapping g between a pattern
space R
m
with R
n
. Thus , for R
m
and v =R
n
v= g()
Quite often g tends to be a non linear matrix type operator resulting in
v=M()
M has different form for different memory models. The algorithm which
computes M is known as recording or storage algorithm. Mostly M is
computed using input pattern vectors.
Based on principle of recall, associative pattern may be classified into static and
dynamic
M
Static model(non recurrent)
Soft Computing
(0) M v(k + )
(k)
(k+1) v(k
)
Dynamic model (recurrent)
For static model, the associated v for the input pattern is recognised in one
feed forward pass whereas for a dynamic network the following recursion
formula is put until an equilibrium state is reached.
v
(k)
= N(
k
)
v
(k+)
= N(
k
, v
(k)
)
Auto-correlators:
Also known as Hopfield Associative Memory
First order correlaters obtain their connection matrix (indicative of association
of the pattern with itself, by multiplying a pattern element with other pattern
elements.
For a first order autocorrealators that stores M bipolar patterns
A
1
,A
2,
...........A
m
by summing.
T=_ |A
I
T
]|A
I
]
m
I=1
Here T=[t
ij
] is a pp connection matrix and A
i
{-1,1}
p
Recall equation of the autocorrelators: it is a vector multiplication
followed by a point wise nonlinear threshold operation.
a
i
new
= f(_ a
I
t
Ij
, a
j
oId
)
p
j=1
j=1,2,....p I
Where A
i
=(a
1
, a
2
, .............a
p
) and the two parametric threshold function is
f( , )=_
if >
if =
- if <
II
Working:
Consider the following path
Soft Computing
A
1
= (-1, 1, -1, 1)
A
2
= (1, 1, 1, -1)
A
3
= (-1, -1, -1, 1)
Which are to be stored as an autocorrelator
The connection matrix
T= _ |A
I
T
]
3
I=1
4
1
[A
i
]
1
4
=_
-
-
-
- -
_
Recognition of stored patterns:
The autocorrelator is presented a stored pattern
A
2
= (1, 1, 1, -1)
a
i
new
= f(_ a
I
t
Ij
, a
j
oId
)
p
j=1
=f(3+1+3+3 , 1)=1
a
2
new
= f(1+3+1+1, 1)=f(6,1)=1
a
3
new
= f(10,1)=1
a
4
new
= f(-10,1)=-1
this is same as A
2
Recognition of noisy pattern:
Consider a vector A
1
=(1, 1, 1, 1) which is a distorted presentation of one of the
stored pattern.
Hamming distance measure can be used to find the proximity of hte noisy
vector to the stored patterns using the Hamming distance measure.
Hamming distance of a vector X from Y
Given X={x
1
, x
2
, x
3
........x
n
) and Y={y
1
, y
2
,..........y
n
) is given by
HD(x , y)=_ |
n
I=1
x
i
y
i
]
Now
HD (A
1
, A
1
) = 4
HD (A
1
, A
2
) = 2
HD (A
1
, A
3
) = 6
It is evident that A
1
is closer to A
n
.
Now let use equation I and find if autocorrelators can find the pattern or not
a
1
new
= f (4, 1) =1
Soft Computing
a
2
new
= f (4, 1) =1
a
3
new
= f (4, 1) =1
a
4
new
= f (-4, 1)=-1
(1, 1, 1, -1)=A
2
Hence in case of partial vectors, an autocorrelators results in the refinements of
the pattern or removal of noise to retrieve the closest matching pattern.
Soft Computing
Lecture-41
Heterocorrelators (KOSKOS DISCRETE BAM):
Bidirectional associative memory (BAM) is a two level non linear neural
network based on associative network.
It has the following operations.
1. There are N training pairs{(A
1
B
1
), (A
2
,B
2
),............(A
n
, B
n
) }where
A
i
= {a
i1
, a
i2
... a
in
} B
i
= {b
i1
, b
i2
........b
in
)
That aii
or b
ii
is either OFF or ON
OFF=0 ON=1 OFF=-1(in bipolar mode)
2. Correlation matrix
M=_ X
I
T
Y
I
Z
I=1
To retrieve the nearest (A
i
, B
i
) pair given any ( , ), the recall equation are as
follows
Starting with ( , ) as the initial condition, we determine a finite sequence (
,) , (, )........... until an equilibrium point (
F
,
F
) is created
Here =(M)
=(M
T
)
(F)=G=g
1
, g
2
...........g
n
F=(f, f....................fn)
ui=_
if f
I
>
_
(binaiy)
-(bipolai)
, f
I
<
pievious g
I
, f
I
=
Working:
Suppose N=3 with patterns
A
1
= (100001) B
1
= (11000)
A
2
= (011000) B
2
= (10100)
A
3
= (001011) B
3
= (01110)
Converting these to bipolar forms
X
1
= (1 -1 -1 -1 -1 1) Y
1
= (1 1 -1 -1 -1)
X
2
= (-1 1 1 -1 -1 -1) Y
2
= (1 -1 1 -1 -1)
X
3
= (-1 -1 1 -1 1 1) Y
3
= (-1 1 1 1 -1)
Soft Computing
The matrix M is calculated as
M=X
1
T
Y
1
+ X
2
T
Y
2
+ X
3
T
Y
3
=
l
l
l
l
l
l
- -
- -
- - -
- - -
-
- - -
1
1
1
1
1
1
Suppose =X
3
helping to retrieve the associated pair Y
3
M = (-1 -1 1 -1 1 1) (M) = (-6 6 6 6 6 -6)
= (M) = ( -1 1 1 1 1 -1)
M
T
= (-5 -5 5 -3 7 5)
=(M
T
)=(-1 -1 1 -1 1 1)
M = (-6 6 6 6 -6)
(M)==(-1 1 1 1 -1)
=
Here is same as Y
3
Hence (
F
,
F
) = (X
3
Y
3
) is desired result.
Back propagation learning: It is a systematic method of learning multilayer
artificial neural network.
Consider the network:
I
l1
O
l1
V
l1
O
H1
O
01
21 W
11
O
l2
O
H2
W
21
O
02
W
m1
2
l1
O
0n
O
ll
O
Hm
l is input node. m is hidden node n is output node
1
2
l
1
2
m
1
2
n
Soft Computing
Set of input and output:
I
1
I
2
. I
I
0
1
0
2
0
n
. . .8 . . .8
Step 1: is input layer computation:
Consider linear activation function
LOL= I
L
l1 l1
Step 2: Hidden layer computations:
Input of the hidden layer of any hidden layer neuron p
I
Hp
=V
1p
O
2L
+ V
2p
O
12
+..........+ V
1p
O
LL
P=1, 2, 3 ...n
[I]
H
= [V]
T
[o]
I
m1 mt lI
let us consider a sigmoidal function at p
ln
hidden neuron.
O
HP
=
1
1+c
-(I
HP
-O
HP
)
O
HP
threshold of the P
th
neuron
I
HP
input of the P
th
neuron
Output of the hidden neuron
{O}
HP
=
`
1
1
1
1
1
1+c
-(I
H1
-O
H1
)
1
1+c
-(I
H2
-O
H2
)
.
.
.
1
1
1
1
1
Step 3:Output layer computation:
I
oq
=W
1q
O
H1
+W
2q
O
H2
+.............+W
mq
O
Hm
Q=1,2,3..........n
{I}
o
= [W]
T
{O}
H
n1 nm m1
Considering sigmoidal function, the o/p of the q
th
output neuron is given by
Soft Computing
O
oq
=
1
1+c
-(I
oq
-O
oq
)
{O}
o
=_
.
.
1
1+c
-(I
oq
-O
oq
)
_
Step 4: Calculation of error:
For any r
th
o/p neuron error norm in o/p of r
th
neuron is taken
E
r
1
=
ei
2
=
(T -0)
2
The square of the error is considered since irrespective of whether error is (+)ve or
negative we consider only absolute values. Euclidean norm of error E for the first
training path is given by
E=
1
2
_ ( T
or
-0
r
)
2 n
r=1
If some technique is used for all training pattern we get
E(V,W)=_ Et(v, W, I)
nsct
j=1
Soft Computing
Lecture-41
Training of neural network:
The synaptic weighting and aggregation operations performed by the synapses and
soma respectively provide a similarity measure between the input vector I and the
synaptic weights [V] and [W]( accumulation knowledge base). When a new input
pattern is significantly different from the previously learned pattern is presented to
the neural network, the similarity between this input and the existing knowledge
base is small.
Method of steepest descent:
The error surface is given by
E=_ E
p
(v, W, I)
nsct
p=1
Multilayer feed forward networks with nonlinear activation functions have mean
squared error surface above the total Q-dimensional weight space R
Q
. In general,
the error surface is complex and consists of many local and global minima.
In back propagation (BP) Network, at any given error value E, including minima
region, there are many permutation of weights which give rise to the same value of
E.
Soft Computing
BP is never assured of finishing global minimum as in the simple layer delta rule
case.
At start of the training process, gradient descent search begins at a location with
error value E determined by initial weight assignments W(O),V(O) and the training
pattern pair (I
P
,O
P
) where
E =
nset
E
p
=
nsct
p=1
xnset
(T
K
p
-0
OK
p
)
2
k
p=1
During training, the gradient descent computations incrementally determine how
the weights should be modified at new location to move most rapidly in the
direction opposite to the direction of deepest descent. After the incremental
adjustments of weights have been made, the location is shifted to a different E
location on the error weight surface. The process is repeated for each training
pattern, progressively shifting the location to lower level until a limit on the total
number of training cycle is reached.
In moving down the error path, the path followed is generally not the ideal path. It
depends on the shape of the surface and the learning rate coefficient
For simplicity assuming error surface to be truly spherical
A
= (V
i+1
- V
i
)i + (W
i+i
-W
i
)j
= Vi + Wj
Gradient is given by
u
=
0E
0V
i +
0E
0W
j
Soft Computing
Hence unit vector in the direction of the gradient is given by:
e
AB
=
1
|G|
{
0E
0V
i +
0E
0W
j ]
Where A
= -{
0E
0V
i +
0E
0W
j ]
=
1
|G|
is a constant
hence V=-
0E
0V
; W=-
0E
0W
For k
LN
o/p neuron E
k
is given by
E
k
=1/2 (T
k
-O
ok
)^2
T
k
target o/p
O
ok
compute o/p
To compute
0E
R
0W
R
, chain rule of differentiation be applied
oE
k
ow
Ik
=
oE
k
oo
ok
oo
ok
oI
ok
oI
ok
ow
Ik
Where
oE
k
oo
ok
= -(T
k
- 0
k
)
o/p of the k
th
neuron
0ok =
+e
-(I
oR
-O
oR
)
Hence
0o
oR
0I
oR
=
-1|- c
-(I
oR
-O
oR
)
]
1+c
-(I
oR
-O
oR
)
2
=
c
-(I
oR
-O
oR
)
|1+c
R
]
= o
ok
|
c
-(I
oR
-O
oR
)
1+c
-(I
oR
-O
oR
)
+ - ]
= o
ok
|
c
-(1-0)
-1-c
-(1-0)
1+c
-(I
oR
-O
oR
)
+ ]
= o
ok
( - o
ok
)
0I
oR
0w
R
as I
ok
= w
1k
o
H1
+w
2k
o
H2
+ . +w
mk
o
Hm
0I
oR
0w
R
= o
HI
Soft Computing
Hence
oE
k
ow
Ik
= -(T
k
-0
k
)o
ok
( -o
ok
)o
Bi
w
Ik
= -
oE
k
ow
ik
In matrix form
[w]= {O}
H
<d>
mn=m1 1n
<d>= (T
k
-0
k
)o
ok
( -o
ok
)
By applying chain rule
oE
k
ov
Ij
=
oE
k
oo
ok
oo
ok
oI
ok
oI
ok
oo
HI
oo
HI
oI
Hj
oI
Hj
ov
Ij
oE
k
oo
ok
oo
ok
oI
ok
= -(T
k
-0
k
)o
ok
( -o
ok
)
oI
ok
oo
HI
= w
ik
oo
HI
oI
Hj
= (o
Bi
)( -o
Bi
)
oI
Hj
ov
Ij
= o
Ij
= I
Ij
Hence
oE
k
oo
ok
oo
ok
oI
ok
oI
ok
oo
HI
= -w
ik
u
k
= -l
i
Let d
k
*
=-l
I
(o
HI
)( - o
HI
)
oE
k
ov
Ij
= -u
I
-
I
Ij
v= {I}
H
<d
*
>
E effect of learning rate keep constant through all iteration
Add momentum termincreases rate of convergence
|w] = -
oE
ow
+|w]
|v] = -
oE
ov
+|v]
momentum coefficient
value should be positive but less than 1
Soft Computing
Lecture-42
* Advantage of having hidden layers:
This allows ANN to develop its own internal representation of this mapping such a
rich and complex internal representation allows the network to brainy any kind of
mapping not just linearly separable ones.
Back propagation algorithm:
Basic loop structure:
Initialize the weights
Repeat
For each training pattern
Train on that pattern
End
Until the error is acceptably low.
Step by step procedure:
Step 1. Normalize the input and outputs with respect to their maximum values(it is
proved that the neural network work better if o/p and i/p lie between 0 and -1) for
training pair assume there are l inputs given by
{I}
I
and n o/p {O}
o
in a normalised form.
l1 n1
Step 2. Assume the number of neurons in the hidden layer to lie between l<m<2l
Step 3. [ [v] represents the weight of synapse connecting input neurons and hidden
neurons and [w] represents weights of synapses connecting hidden neurons and o/p
]
Initialize the weights to small random values usually from -1 to 1.
For general purpose can be assumed to be
[v]
o
= [random weight]
[w]
o
= [random weights]
[v]
o
=[w]
o
=[0]
Soft Computing
Step 4. For the training data , presents one set of input and output. Present the
pattern to the input layer {I}
I
as inputs to the input layers. By using activation
function the output of the input layer may be evaluated as
{O}
I
= {I}
I
l1 l1
Step 5. Compute the input to the hidden layer by multiplying corresponding
weights of synapses as
{I}
H
= [v]
T
{O}
I
m1 = ml l1
Step 6. Let the hidden layer units evaluate the o/p using the synmoidal function.
{D]
H
=
.
.
1
1 +e
-I
H
)
(6 = )
Step 7. Compute the input to the o/p layers by multiplying corresponding weights
of synapses as
{I}
o
= [w]
T
{O}
H
n1 = nm m1
Step 8. Let the o/p layer units evaluate the output using sigmoidal function as
{0]
o
=
.
.
+e
-I
o
)
Step 9. Calculate the error, the difference between the n/w output and the desired
output as for the i
th
training set as
E
p=
_
E
(T
]
-o
o]
)
2
n
Soft Computing
Step 10. Find {d} as
{d}=
`
1
1
1
1
.
.
(T
k
-0
k
)o
ok
( -o
ok
)
.
.
.
n
1
1
1
1
1
Step 11. Find [Y] matrix as
[Y] = {O}
H
<d>
mn = m1 1n
Step 12. Find [w]
tH
= [w]
t
+ [Y]
mn mn mn
Step 13. Find {l} = [w] {d}
m1 mn n1
{d
*
}=
`
1
1
1
1
.
.
l
i
(o
Bi
)( -o
Bi
)
.
.
.
m m
1
1
1
1
1
Find [x] matrix as
[x] = {O}
I
<d*> = {I}
I
<d*>
lm l1 1m l1 1m
Step 14. Find [v]
HI
=[v]
t
+ [x]
lm lm lm
Step 15. Find
[v]
t+1
= [v]
t
+ [v]
t+1
[w]
t+1
= [w]
t
+ [w]
t+1
Step 16. Find error rate as
E
n sct
Step 17. Repeat step 4-16 until the convergence in the error rate is less than the
tolerance value.
Soft Computing
Lecture-43
Effect of tuning parameters of the back propagation neural
network:
Momentum factor
Learning coefficient
Parameter sigmoidal gain
Threshold value
Momentum factor: It has significant role in deciding the values of learning rate
that will produce rapid learning.
determines the step size of change in weight or biases.
if momentum factor is zero the smoothening is minimum and the entire weight
adjustment comes for newly calculated change.
if momentum factor is 1, new adjustment is ignored and previous one repeated.
Between 0 and 1 is the region where the weight adjustment is smoothened by an
amount proportional to the momentum factor. Momentum factor 0.9 has been
found suitable for most of the problem.
role of momentum factor is to increase the rate of learning.
Learning Coefficient: Choice of learning depends on number and types of input
pattern
=
.
(N
1
2
+N
2
2
+ + N
m
2
)
Soft Computing
Cumulative Update Of Weights:
Cumulative BP individual weight changes are accumulated for an epoch of training
and summed. These cumulative weight changes are applied in the individual
weights.
Where N is the number of pattern of type and m is the number of different pattern
type
it may be difficult to spot patterns. Target o/p is used to determine a pattern
type.
if the learning coefficient is large ,that is greater than 0.5,the weights are
changed drastically but this may cause optimum combination of weights to be
overshot resulting in oscillation about the optimum.
if the learning rate is small less that 0.2 the weight are changes in small
increment thus causing the system to converge slowly but with little oscillation.
Sigmoidal Gain:
If sigmoidal function is selected, the i/p , o/p relationship of the neuron can be set
as :
O=
1
(1+c
-(I+)
)
= known as scaling factor is the sigmoidal gain
affects back propagation- Improper combination of scaling factor or momentum
factor leads to over correction and poor convergence.
Threshold value : either assigned a small value and kept constant or it can be
changed during training depending on the application.
Dealing with local minima:
Fast back propagation: adjusting the activator value prior to adjusting weights.
Extended Back propagation for Recurrent Network:
For recurrent network, an extended version of back propagation is applied to find
gradient vectors.
Soft Computing
Consider the following network
x
3
=f
3
(x
1
,x
5
)
x
4
=f
4
(x
2
,x
3
)
x
5
=f
5
(x
4
,x
6
)
x
6
=f
6
(x
4
,x
6
)
There are two distinct operating mode through which the network may satisfy eq-1
1. Synchronous operation
2. Continuous operation
Soft Computing
Lecture-44
Synchronous operation:
if a network is operated synchronously all node change their output simultaneously
according to global clock signal and there is a time delay associated with each link.
i.e.
x
3
(t+1)=f
3
(x
1
(t),x
5
(t)
)
x
4
(t+1)=f
4
(x
2
(t),x
3
(t))
x
5
(t+1)=f
5
(x
4
(t),x
6
(t))
x
6
(t+1)=f
6
(x
4
(t),x
6
(t))
Back propagation through time (BPTT):
We have to identify a set of parameter/weights that will make the output of a mode
follow a given trajectory (tracking or trajectory following)
This is done by unfolding of time to transform a recurrent n/w into a feed forward
one as long as t does not exceed a reasonable T max
Soft Computing
Network in fig(2) and fig(3) behave identically if all copies of the parameters or
weight remain identical across different time slot.
For parameter to remain constant one can go for parameter sharing.
After setting up the parameter nodes in this way, back propagation is as usual
applied to the network.
The error signals of a parameter node come from nodes located at layers across
different terms instants , thus the back propagation procedure (and the
corresponding steepest descent) for this kind of unfolded network is often called
back propagation through time(BPTT)
Real Time Recurrent Learning (RTRL):
Only compilation with BPTT is that it requires extensive computation resources
when the sequence length T is large, as duplication of node makes both
simulation time and memory requirement proportional to T
RTRL performs on line learning i.e. to update the parameter while the network is
running rather than at the end of the presented sequence.
Soft Computing
Consider the following network:
Assume: E=_ E
I
T
I
= _ (u
I
-x
I
)
2 T
I
i= time index
d
i
= desired o/p
x
i
=actual o/p
to save computation and memory time a better option is to minimize E
i
at each step
time instead of trying to minimize the sequence at the end of the sequence.
At i=1
0
+
E
0a
=
0E
1
0X
1
8
+
I
4
0X
a
Error at i=2
0E
2
0a
=
0E
2
0X
2
0x
2
0a
and
0x
2
0a
=
8
+
x
2
0a
+
0x
2
0X
1
0x
1
0a
J=3
0E
3
0a
=
0E
3
0X
3
0x
3
0a
and
8
+
x
3
0a
=
0x
3
0a
+
0x
3
0X
2
0x
2
0a
Soft Computing
Lecture-45
Continuously operated Networks: Masons Gain Formula
x
3
=f
3
(x
1
,x
5
)
x
4
=f
4
(x
2
,x
3
)
x
5
=f
5
(x
4
,x
6
)
x
6
=f
6
(x
4
,x
6
)
In a network that is operating in continuous mode all nodes continuously change
their o/p till eq -1 is satisfied --- this operating mode is of particular interest for
analog circuit implementation.
Here dynamical evolution rule is imposed on the network.
T
3
d x
3
dt
+ x
3
= f
3
(x
1
,x
5
)
Soft Computing
When x
3
stops changing i.e.
d x
3
dt
is zero then eq 1 is satisfied. It is assumed that at
least one such fixed point exist for every node o/p.
Assuming that error measure is a function of o/p node:
0
+
E
0X
3
=
0
+
E
0X
y
0I
4
0X
3
,
0
+
E
0X
0
=
0
+
E
0X
3
0I
0X
0
+
0
+
E
y
0X
0
0I
6
0X
0
+
0E
0X
6
,
0
+
E
0X
y
=
0
+
E
0X
S
0I
S
0X
y
+
0
+
E
06
0I
6
0X
y
,
0
+
E
0X
s
=
0
+
E
0X
S
0I
3
0X
s
+
0E
0X
S
,
Let
0
+
E
0X
be denoted as 6
i
E
3
=E
4
w
43
E
4
=E
5
w
54
+ E
6
w
64
E
5
=E
5
w
35
+
0E
0X
S
E
6
=E
5
w
56
+ E
6
w
66
+
0E
0X
6
W
ij
=
0I
0X
]
Once Ep , the gradient for a generic parameter in node i can be found directly
o
+
E
o
=
o
+
E
ox
I
of
I
o
=
I
of
I
o
Eq -2 can be represented as a recurrent n/w as shown below:
Soft Computing
An alternative approach for finding
I
(pie) Mansons gain formula (which is
commonly used to find transfer function of linear system represented in signal flow
graph or block diagrambasically cause and effect representation of a linear
system, recurrent error propagation n/w is also such a system
Mason Gain Formula:
General formula b/w
i
and i/p quantify I
M=
0
0I
= _
M
K
N
k=1
M= gain b/w & I
I
= o/p of node i of the recurrent error propagation n/w
I= input
N= total number of forward path from I to
I
M
k
= gain of k
th
forward path.
= 1-Z
m
P
m1
+ Z
m
P
m2
- Z
m
P
m3
+ .
P
mr
= gain product of m
th
possible combination of r non touchy loops
k= the for that part of the network which is not touchy the k
th
forward path.
Expression:
To express
3
in terms of I
1`
& I
2
There are three loops.
Gain of these loops:
Loop 1 (543) l
1
= w
54
w
34
w
35
Loop 2 (5643) l
2
= w
56
w
64
w
43
w
35
Loop 3 (6) l
3
= w
66
,
Loop 1 and loop 3 are non touching i.e. they do not share any common node.
= 1- (l
1
+ l
2
+ l
3
) + (l
1
l
3
)
To find gain between l
3
& I
1
there are two direct path
I= 5-4-3 and I-5-64-3
For first path M
1
= w
54
w
43
1
=1- w
66
(in a only loop 3 is non touching)
For 2
nd
path M
2
= w
56
w
64
w
43
2
=1 since no looping non touchy
Gain =
M
1
1
+M
2
Soft Computing
For I
2
to
3
only one direct path I
2
= 6.4-3
M
1
= w
64
w
43
1
=1
G
2
=M
1
1
/
3
= G
1
I
1
+ G
2
I
2
Lecture-45
Hebbian learning: based on correlative weight adjustments. This is oldest
learning mechanism.
In this i/p o/p pattern pair( x
i
, y
i
) are associated by the weight matrix W, known
as the correlation matrix computed as
W=_
xi yi
T n
I=1
yi
T
- transpose of associated o/p vector y
Associative memory which belongs to the class of single layer feed forward
network or recurrent n/w depending on its association exhibit hebbian learning.
An associative memory is a storehouse of associated patterns which are encoded in
some form.
When the store house is triggered with a pattern the associated pattern is recalled
as o/p.
Hetro associative memory: if associative pattern (x, y) are different and if model
recalls a y given on x or vice versa
Auto associative memory: x and y refer to same pattern (pattern recognition).
Unsupervised learning: used for data elasticity, feature detection,
similarity detection.
- Competitive learning.
- Kohnum self organizing realm map.
- Principle compound analysis.
-
Soft Computing
Competitive learning Network:-
This n/w updates weight only on the basis of the input pattern.
x
1
x
2
x
3
All i, i/p unit connected to all o/p unit j with weight w
ij
X =
[ x
1
x
2
x
3
]
W = _
w
11
w
12
w
13
w
14
w
21
w
22
. .
w
31
w
32
. .
_
- No. of i/p is input dimension ( x, y, z )
- No. of o/p is no of cluster the data has to be divided.
Cluster center specified by weight vector connected to o/p unit.
Activation value a
i
of the o/p unit j is then specified as.
A
j
= _ x
I
w
Ij
3
j=1
= X
T
W
i
= W
i
T
X
The output unit with the highest activation must be selected for further
processing, which is what simplified by competitive.
If K
th
unit has max activation, the weights leading to this unit are
updated according to the competitive or the so called winner take all
learning rule.
x
1
2
3
4
Soft Computing
W
k
(t+1) =
W
R
(t)+q(x(t)-W
R
(t))
||W
R
(t)+q(x(t)-W
R
(t))||
The normalization operation is to ensure weight is along of unit length.
E.g. 1. Implements a sequential scheme for finding the cluster centre
of a data set.
When an input X is presented, the weight vector closest to x rotates
towards it consequently weight vectors appears to those areas where
most input appear.
Euclidean distance as a sli similarity measure is also used for
competitive learning.
a
j
= (_ (x
I
- w
Ij
)
2 3
I=1
)
0.5
=|| X - W
j
||
The weight of the o/p unit with smallest activation is updated as:
W
k
(t+1) =W
k
(t) + (x(t) - W
k
(t))
The wining units weight shift towards the input x, neither the data nor
the weight must be of unit length.
Soft Computing
V
1
V
2
A
ct1
A
cl
V
1
V
2
A 0 0 1
2
-0.7
0.1
0
0
0
0.1
A
B
U
2 = 0.1
U
2 = 0.1
B 0 1 1
2
-0.4 0
0
1
1
B
B
U
2 = 0.1
U
2 = 0.1
C
1 0 1 -0.7 0
E
3
=
1
2
_ _ _ L
j I k
(i, j)v
I
k ( v
j, k-1
, +v
j ,k+1
)
E =
1
E
1
+
2
E
2
+
3
E
3
Data Access Motonocity
v
I
= _
if T
Ij
= T
Ij
v
Ij
> v
I
T
j
= i T
Ij
v
j
< v
I
V
i
= f _ T
Ij
v
j
n
j=1 j =
-v
I
f(x) = lgn (x)
if x > o
if x <
E
2
= _ (_ v
Ij-1
5
j=1
5
I=1
)
L(1+) = L(1,2) V
im
V
2m-1 +
L(1,2) V
1 m
,V
2
,m+v
j
)
E =
1
2
__ T
Ij
v
j j =1
v
I
+ _v
I
0
I
E = -v
I
_T
Ij
(v
j
-v
I
)
C
1j
= ( v
1j
+ v
2j
+ + v
sI
- )
2
= (_ v
Ij-1
5
j=1
)
2
For al column,
E
1
= _ C
Ij
5
j=1
= _ (_ v
Ij-1
5
I-1
5
j=1
)
2
Likewise for rows,
E
2i
= _ (_ v
j1-1
5
I-1
5
j=1
)
2
Soft Computing
v
1
=0,7 T
12
=T
21
=0.3 v
2
= -0.1
v
1
= v
2
=0 etali A E=0
v
1
=0 v
2
=0 etali B E=0.1
A
ct1
= T
12
V
2
- V
1
= 0.3
v
2
0.7 = -0.7
A
ct2
= T
21
V
1
V
2
= 0.3
v
1
+ 0.1 = 0.1
V
1
=0 V
2
=1
E= -
1
2
(T
12
V
1
V
2
+ T
21
V
2
V
1
) + V
1
U
1
+ V
2
U
2
= -
1
2
(-0.3+-.)+.+-.
= - ( + -.) = - .
A
ct1
= T
12
V
2
- U
1
= 0.3
1 0.7 = -0.4
A
ct2
= T
21
V
1
U
2
= 0.3
0 + 0.1 = 0.1
V
1
=0 V
2
=1 (B)
Inties C V
1
=1 V
2
=0
A
ct1
= 0.3
0 0.7 = -0.7 V
1
=0
A
ct2
= 0.3
1 + 0.1 = 0.4 V
2
=1
V
1
=0 V
2
=1 (B)
V
1
=0 V
2
=0 (A)
(2.) A
ct2
= T
21
V
1
U
2
= 0.3
0 + 0.1 = 0.1
A
ct1
= T
12
1 0.7 =0.3 - 0.4 = - 0.1
Soft Computing
Lecture-46
KOHONEN SELF ORGANIZING NETWORK:-
-Competition based network for data clustering
.
-Impose a neighborhood constraint on the output units, such that a
certain topological property in the input data is reflected in the output
unit weights.
- Learning similar to competitive learning network.
-Similarity measure is selectist the wining unit is update. Also all the
weight in the neighborhood of wining unit is also updated.
Step1: || X W
c
|| = || X W
i
||
Step2: NBC denote a set of index corresponding to a neighborhood
around winner C. Weight of winner and is neighboring unit are updated.
W
i
= (x w
i
) , i WBC
Neighborhood function:
c
(I)
= exp(
||P
- P
o
||
2
2o
2
)
Pi position of output unit i
Soft Computing
Pc position of output unit c
Size of neighboihoou shoulu ueclaie at wi = c
(I)
(X W
i
)
Hebbian learning:-
Simple method of synaptic weight change suggested by Hebb:
When two cell fire simultaneously there connection strength or
weight increases, weight increases between two neurons is proportional
to the frequency at which they fire together.
wij = yiyi this is a type of co-ielation leaining iule.
yi output of neuion I can be iejecteu at the input ieceiveu by
neuion j fiom neuion I is xi.
wij = xi yj ; i.e. not change piopoitional to co-ielation of the ip,
op signals.
By using iepiesent foimula f(.); yj is given by
Yi =f(wjTX)
wij = (
T X)x
I
This Might cause unconstrained growth of weight. So Hebb rule is
modified 1
st
Method Normalization.