Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
10 views

Week 3

Uploaded by

om55500r
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Week 3

Uploaded by

om55500r
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

2.

MULTI LAYER NETWORKS

UNIT II MULTI LAYER NETWORKS

Back Propagation Networks (BPN) - Training - Architecture-Algorithm, Counter


Propagation Network (CPN) - Training - Architecture, Bi-Directional Associative
Memory (BAM) - Training-stability analysis, Adaptive Resonance Theory – Adaptive
Resonance Theory (ART) - ART1- ART2 – Architecture -Training, Hop Field Network -
Energy Function - Discrete - Continuous - Algorithm - Application – Travelling Sales
Man Problem TSP
2 BACK PROPAGATION NETWORKS (BPN)

2.1 NEED FOR MULTILAYER NETWORKS

 Single Layer networks cannot used to solve Linear Inseparable problems & can only be
used to solve linear separable problems
 Single layer networks cannot solve complex problems
 Single layer networks cannot be used when large input-output data set is available
 Single layer networks cannot capture the complex information’s available in the
training pairs

Hence to overcome the above said Limitations we use Multi-Layer Networks.

2.2 MULTI-LAYER NETWORKS

 Any neural network which has at least one layer in between input and output layers is
called Multi-Layer Networks

 Layers present in between the input and out layers are called Hidden Layers

 Input layer neural unit just collects the inputs and forwards them to the next higher
layer

 Hidden layer and output layer neural units process the information’s feed to them and
produce an appropriate output

 Multi -layer networks provide optimal solution for arbitrary classification problems

 Multi -layer networks use linear discriminants, where the inputs are non linear

2
2.3 BACK PROPAGATION NETWORKS (BPN)

Introduced by Rumelhart, Hinton, & Williams in 1986. BPN is a Multi-layer


Feedforward Network but error is back propagated, Hence the name Back Propagation
Network (BPN). It uses Supervised Training process; it has a systematic procedure for training
the network and is used in Error Detection and Correction. Generalized Delta Law /Continuous
Perceptron Law/ Gradient Descent Law is used in this network. Generalized Delta rule
minimizes the mean squared error of the output calculated from the output. Delta law has faster
convergence rate when compared with Perceptron Law. It is the extended version of Perceptron
Training Law. Limitations of this law is the Local minima problem. Due to this the convergence
speed reduces, but it is better than perceptron’s. Figure 1 represents a BPN network
architecture. Even though Multi level perceptron’s can be used they are flexible and efficient
that BPN. In figure 1 the weights between input and the hidden portion is considered as Wij
and the weight between first hidden to the next layer is considered as Vjk. This network is valid
only for Differential Output functions. The Training process used in backpropagation involves
three stages, which are listed as below

1. Feedforward of input training pair


2. Calculation and backpropagation of associated error
3. Adjustments of weights

Figure 1: Back Propagation Network

3
2.3.1 BPN Algorithm

The algorithm for BPN is as classified int four major steps as follows:

1. Initialization of Bias, Weights


2. Feedforward process
3. Back Propagation of Errors
4. Updating of weights & biases

Algorithm

I. Initialization of weights
Step 1: Initialize the weights to small random values near zero
Step 2: While stop condition is false , Do steps 3 to 10
Step 3: For each training pair do steps 4 to 9

II. Feed forward of inputs

Step 4: Each input xi is received and forwarded to higher layers (next hidden)
Step 5: Hidden unit sums its weighted inputs as follows
Zinj = Woj + Σxiwij
Applying Activation function
Zj = f(Zinj)
This value is passed to the output layer
Step 6: Output unit sums it’s weighted inputs
yink= Voj + Σ ZjVjk
Applying Activation function
Yk = f(yink)

III. Backpropagation of Errors

Step 7: δk = (tk – Yk)f(yink )


Step 8: δinj = Σ δjVjk

IV. Updating of Weights & Biases

Step 8: Weight correction is Δwij = αδkZj


bias Correction is Δwoj = αδk

4
V. Updating of Weights & Biases

Step 9: continued:
New Weight is
Wij(new) = Wij(old) + Δwij
Vjk(new) = Vjk(old) + ΔVjk
New bias is
Woj(new) = Woj(old) + Δwoj
Vok(new) = Vok(old) + ΔVok
Step 10: Test for Stop Condition

2.3.2 Merits

• Has smooth effect on weight correction


• Computing time is less if weight’s are small
• 100 times faster than perceptron model
• Has a systematic weight updating procedure

2.3.3 Demerits

• Learning phase requires intensive calculations


• Selection of number of Hidden layer neurons is an issue
• Selection of number of Hidden layers is also an issue
• Network gets trapped in Local Minima
• Temporal Instability
• Network Paralysis
• Training time is more for Complex problems

2.4 COUNTER PROPAGATION NETWORK [CPN]

This network was proposed by Hect & Nielsen in 1987.It implements both supervised
& Unsupervised Learning. Actually it is a combination of two Neural architectures (a) Kohonan
Layer - Unsupervised (b) Grossberg Layer – Supervised. It Provides good solution where long
training is not tolerated. CPN functions like a Look-up Table Generalization. The training pairs
may be Binary or Continuous. CPN produces a correct output even when input is partially
incomplete or incorrect. Main types of CPN is (a) Full Counter Propagation (b) Forward only
Counter Propagation. Figure 2 represents the architectural diagram of CPN network.

5
• Forward only nets are the simplified form of Full Counter Propagation networks
• Forward only nets are used for approximation problems

The first layer is Kohonan layer which uses competative learning law.The procedure
used here is , when an input is provided the weighted net values is calculated for each node.
Then the node with maximum output is selected and the signals from other neurons are
inhibited. This output from the wining neuron only is provided to the next higher layer, which
is the Supervised Grosssberg layer.Grossberg processing is similar to that of an normal
supervised algorithm.

Training of Kohonan network

• Kohonan training implements self organising Unsupervised training procedure


which takes time.
• Kohonan network involves winner takes tall weight updating rule
• Uses Euclidean distance measure or Dot product for clustering/ grouping of inputs

Training of Grossberg Network

• Uses Supervised training


• Similar to BPN forward pass
• Weights are updated based on Delta Law

2.4.1 Algorithm: Full CPN

Step 1: Initialize the weights & learning rate to a small random value near zero
Step 2: While stop condition is false , Do steps 3 to 9
Step 3: Set the X- input layer input Activations to vector X
Step 4: Each input xi is received and forwarded to higher layers (Kohonan Layer)
Step 5: Kohonan unit sums its weighted inputs as follows
Inputs & Weights are normalised, then net value is calculated as follows
Kinj = Woj + XW ( in vectors)
Applying Activation function
Kj = f(Kinj)
Step 5A: Wining Cluster is identified.( The node with a maximum output is selected as Winner.
Only this output is forwarded to the next Grossberg layer, All other units output are
inhibited)

6
For clustering the inputs Xi’s , Euclidian Distance Norm function is used
Dj = (xi-vij) ^ 2 + (yk-wkj) ^ 2
Dj should be minimum
Step 6: Update the weights over the calculated winner unit Kj
Step 7: Test for stop condition of phase -I
(Phase –I : Input X layer to Z Cluster layers)
Phase – II : Z Cluster layers to Y output layers
Step 8: Repeat steps 5,5A,6 for Phase –II layers
Step 9: Test for stop condition for phase II

2.4.2 Merits
 A combination of Unsupervised (Phase-I) & Supervised
 Network works as like a Look Up Table
 Fast and Coarse approximation
 100 times faster than BPN model

2.4.3 Demerits
 Learning phase requires intensive calculations
 Selection of number of Hidden layer neurons is an issue
 Selection of number of Hidden layers is also an issue
 Network gets trapped in Local Minima

Figure 2: Counter Propagation Networks

7
2.5 BI-DIRECTIONAL ASSOCIATIVE MEMORIES

• Developed by Kosko in 1988


• Hetro Associative with Two layers (Input & Output) [Single layer]
• Transmits signals back and forth between these layers
• Stop condition is that the activations of all neurons remain constant for several
iterations
• It is used to store and retrieve the patterns
• Utilizes directional weighted connection paths

2.5.1 Types of BAM

(a) Binary BAM


(b) Bipolar BAM
(c) Continuous BAM
 Continuous BAM: Uses Log-sigmoidal Function for activation function

Figure 3: Bi-Directional Associative memory

Image adapted from Laurene Fausett, “Fundamentals of Neural Networks, Architectures,


Algorithms and Applications”, Prentice Hall publications.

8
 Figure 3 represents the BAM architecture. BAM contains ‘n’ neurons in X layer and
‘m’ neurons in Y layer. Both X & Y layers can act as input or Output layers. Weights
for X layer is taken as Wij . Weights for Y layer is taken as Wij T. If Binary or Bipolar
Activations are used then it is known as Discrete BAM. If Continuous Activations are
used then it comes under Continuous BAM type.

2.5.2 Algorithm

The following steps explain the procedural flow of BiDirectional Associative Memory

Step 0: Initialize the weights to store a set of input (P) Vectors.

Set all intial activations to Zero

Step 1: For each inputs do steps 2 to 6


Step 2A: Present the input pattern x to X layer
Step 2B: Present the input pattern y to Y layer
Step 3: While activations are not converged do steps 4 to 6
Step 4: Update the activations of units in Y layer.Compute Net value yin = Σxiwij and
compute activatins yj = f(yin). Send signal to X layer
Step 5: Update the activations of units in X layer. Compute Net value xin = Σyjwij and
compute activatins xi = f(xin). Send signal to Y layer
Step 6: Test for Convergence.If the activations of layers X and Y had reached equilibrium
then stop else continue the above said process

2.5.3 Merits

• Unconditionally Stable network


• Best for Content type of address memory
• Special case of Hopfield network
• Best Recall

2.5.4 Demerits

• Incorrect Convergence
• Memory capacity is limited because storage of ‘m’ patterns should be lesser than
‘n’ neurons of smaller layer
• Sometimes the networks learns some patterns which are not provided to it

9
2.5.5 Applications of BAM

• Fault Detection
• Pattern Association
• Real Time Patient Monitoring
• Medical Diagnosis
• Pattern Mapping
• Pattern Recognition systems
• Optimization problems
• Constraint satisfaction problems

2.6 ADAPTIVE RESONANCE THEORY

 Invented by Grossberg in 1976 and based on unsupervised learning model.


 Resonance means a target vector matches close enough the input vector.
 ART matching leads to resonance and only in resonance state the ART network learns.
 Suitable for problems that uses online dynamic large databases.
 Types:
(1) ART 1- classifies binary input vectors
(2) ART 2 – clusters real valued input (continuous valued) vectors.
 Used to solve Plasticity – stability dilemma.

2.6.1 Plasticity - Stability

How to learn a new pattern without forgetting the old traces (patterns) and how to adapt
to the changing environment (i/p). When there is change in the patterns (plasticity) how to
remember previously learned vectors (stability problem) is a problem. ART uses competitive
law (self-regulating control) to solve this PLACITICITY – STABILITY Dilemma. The
simplified ART diagram is given below in Figure 5.

The Adaptive Resonance Theory (ART) consist of


(1) F1 Layer: I/P processing unit also called comparison layer.
(2) F2 Layer: clustering or competitive layer.
(3) Reset mechanism.

2.6.2 Comparison Layer: Take 1D i/p vector and transfers it to the best match in recognition
field (best match - neuron in recognition unit whose weight closely matches with i/p
vector).

10
2.6.3 Recognition Unit: produces an output proportional to the quality of match. In this way
recognition field allows a neuron to represent a category to which the input vectors are
classified.

Vigilance parameter: After the i/p vectors are classified a reset module compares the
strength of match to vigilance parameter (defined by the user). Higher vigilance produces fine
detailed memories and lower vigilance value gives more general memory.

The schematic representation of ART-1 is shown in Figure 4.Figure 6 represents the


supplementary units present in ART-1 networks

2.6.4 Reset module: compares the strength of recognition phase. When vigilance threshold
is met then training starts otherwise neurons are inhibited until a new i/p is provided.

There are two set of weights (1) Bottom up weight - from F1 layer to F2 Layer
(2) Top –Down weight – F2 to F1 Layer

2.6.5 Learning in ART

There are two types of Learning which are explained as below.

Fast learning: Happens in ART 1 – Weight changes are rapid and takes place during
resonance. The network is stabilized when correct match at cluster unit is reached.

Slow Learning: Used in ART 2. weight change is slow and does not reach equilibrium
in each learning iteration.so more memory to store more i/p patterns (to reach stability) is
required.

Figure 4: Schematic representation of ART-1

11
Images adapted from Laurene Fausett, “Fundamentals of Neural Networks,
Architectures, Algorithms and

Applications”, Prentice Hall publications

Figure 5: Basic structure of ART-1 Network

Figure 6: Supplementary Units for ART-1

12
Images adapted from Laurene Fausett, “Fundamentals of Neural Networks,
Architectures, Algorithms and Applications”, Prentice Hall publications

2.6.6 Basic ART Training Algorithm:

Step 1: Initialize the parameters


Step 2: For each input do steps 3 to 8
Step 3: Process F1 layer
Step 4: while Reset Condition is true do steps 5 -7
Step 5: Find a candidate to learn the input pattern. Select F2 unit with Maximum value
Step 6: F1(b) units combine their inputs with F1(a) and F2 layer
Step 7: Test reset condition. If Reset is true then the selected candidate is rejected else
accepted.
Step 8: Learning weights are changed according to the differential equations
Step 9: Test for stop condition

2.6.7 Applications of ART

• Pattern Recognition
• Pattern Restoration
• Pattern Generalization
• Pattern Association
• Speech Recognition
• Image Enhancement
• Image Restoration
• Facial Recognition systems
• Optimization problems
• Used to solve Constraint satisfaction problems

2.7 HOPFIELD NETWORKS

The net is a fully interconnected neural net, in the sense that each unit is connected
to every other unit. The net has symmetric weights with no self-connections, Wij =Wji and
Wii = 0

Only one unit updates its activation at a time and each unit continues to receive an
external signal in addition to the signal from the other units in the net. The asynchronous

13
updating of the units allows a function, known as an energy or Lyapunov function, to be found
for the net.

2.7.1 Architecture of Hopfield Networks

The basic diagram for Hopfield Networks is given in Figure 7. Here no learning
algorithm is used. No Hidden units/layers used. Patterns are simply stored by learning the
energies. Similar to Human brain in storing and retrieving memory patterns. Some patterns /
images are stored & when similar noisy input is provided the network recalls the related stored
pattern. The neuron can be ONN(+1) or OFF(-1).The neurons can change state between +1 &
-1 based on the inputs which they receives from other neurons. Hopfield Network is trained to
store patterns(memories). It can recognize previously learned (stored) Pattern from partial
(noisy) inputs.

Figure 7: Architecture of Hopfield Networks

2.7.2 Types of Hopfield Network

Based on the activation functions used the Hopfield Network can Be classified into two
types. They are

(a) Discrete Hopfield network (b) Continuous Hopfield Network

14
Discrete Hopfield Network – Uses Discrete Activation Function
Continuous Hopfield Network – Uses Continuous Activation Function

Hopfield Networks Uses Lyapunov Energy Function. Energy function guarantees the
network to reach a stable minimum local energy state which resembles the stored patterns

2.7.3 Lyapunov Energy Function

The Lyapunov Energy function for discrete Hopfield is given as follows

Change in Energy is given as equal to [–(neti) ΔYi]

The Lyapunov Energy function for Continuous Hopfield is given as follows

2.7.4 Algorithm

Step 1: Initialize the weights To store the pattern


Step 2: For each i/p vector repeat steps 3 to 7
Step 3: Set the initial activations of the net equal to the external i/p vector X
Yi=Xi(i=1,2,…,n)
Step 4: Perform 5 to 7 for each unit yi
Step 5: Compute
yini= Xi + ΣyiWij
Step 6: Determine the activation response based on the activation function used
Step 7: Broadcast the value of Yi to all other units
Step 8: Test for Convergence

2.7.5 Merits and Demerits of Hopfield Networks


Merits

 Unconditionally Stable network


 Best for Content type of address memory
 Special case of Hopfield network
 Best Recall

15
Demerits

 Incorrect Convergence
 Memory capacity is limited because storage of ‘m’ patterns should be lesser than ‘n’
neurons of smaller layer
 Sometimes the networks learn some patterns which are not provided to it

REFERENCE BOOKS

1. B. Yegnanarayana, “Artificial Neural Networks” Prentice Hall Publications.

2. Simon Haykin, “Artificial Neural Networks”, Second Edition, Pearson Education.

3. Laurene Fausett, “Fundamentals of Neural Networks, Architectures, Algorithms and


Applications”, Prentice Hall publications.

4. James A. Freeman & Skapura, “Neural Networks”, Pearson Education.

************************** ALL THE BEST *******************************

16

You might also like