Module 3
Module 3
UNSUPERVISED NETWORKS
content
Unsupervised Networks
• An Example:
• Suppose we have a set of students
• Let us classify them on the basis of their performance
• The scores will be calculated
UNSUPERVISED LEARNING NETWORKS
CONTD…
• The one whose score is higher than all others will be the
winner
• The same principle is followed for pattern classification in
neural networks
• These nets are called competitive nets
• The extreme form of these nets are called winner-take-all
• In such a case, only one neuron in the competing group will
possess a non-zero output signal at the end of the
competition
UNSUPERVISED LEARNING NETWORKS
CONTD…
• Several ANNs exist under this category
1.Maxnet
2.Mexican hat
3.Hamming net
4.Kohonen self-organizing feature map
5.Counter propagation net
6.Learning Vector Quantization(LVQ)
7.Adaptive Resonance Theory (ART)
UNSUPERVISED LEARNING NETWORKS
CONTD…
• In case of these ANNs the net seeks to find patterns or
regularity in the input data by forming clusters
• ARTs are called clustering nets
• In such nets there are as many input units as an input vector
possessing components
• Since each output unit represents a cluster, the number of
output units will limit the number of clusters that can be
formed
• The learning algorithm used in most of these nets is known
as Kohonen learning
KOHONEN LEARNING
• (b)For the input vector (0.6, 0.6) with learning rate 0.1, find
the winning cluster unit and its new weights.
EXAMPLE CONTD…
a
𝑌1 𝑌2 𝑌3 𝑌4 𝑌5
0.5 0.8
0.6 0.4
0.3 0.2 0.7 0.9
0.1 0.2
𝑋1 𝑋2
𝑥1 𝑥2
EXAMPLE CONTD…
(a) For the input vector (0.2, 0.4) = 𝑥1 , 𝑥2 and learning rate
𝛼 = 0.2, the weight vector W is given by (chosen arbitrarily)
0.2 0.9
Assume the initial the weights 0.4 0.7
0.6 0.5
0.8 0.3
LEARNING VECTOR QUANTIZATION
(LVQ)
LVQ
• It has n input and m output units The weights from the ith
input unit to the jth output unit is given by wij
• Each output unit is associated with a class/cluster/category
• x: Training vector (x1, x2, …xn)
• T: category or class of the training vector x
• wj = weight vector for the jth output unit (w1j, …, wij, …wnj)
• cj = cluster or class or category associated with jth output
unit
• The Euclidean distance of jth output unit is
𝑛 2
• D(j) = 𝑥𝑖 − 𝑤𝑖𝑗
𝑖=1
TRAINING ALGORITHM
0 1
0 0
1.1 0
1 0
Second input pattern [ 1 1 0 0] T=1 calculate the
Euclidian distance
D(2) is minimum 𝑡 ≠ 𝐽
0 1
0 −.01
1.1 0
1 0
Third input pattern [0 1 1 0] , T=1, calculate the Euclidian
distance
0 1
0 −.01
1.09 0
0.9 0
AN EXAMPLE
• Part (a):
• For the given input vector (u1, u2) = (0.25, 0.25) with
𝛼 = 0.25 and t = 1, we compute the square of the Euclidean
distance as follows:
2 2
• D(j) = 𝑤1𝑗 − 𝑥1 + 𝑤2𝑗 − 𝑥2
SOLUTION CONTD…
• For j = 1 to 4
• D(1) = 0.005, D(2) = 0.125, D(3) = 0.145 and D(4) = 0.425
• As D(1) is minimum, the winner index J = 1. Also t =1
• So, we use the formula
𝑤𝑗 (𝑛𝑒𝑤) = 𝑤𝑗 (𝑜𝑙𝑑) + 𝛼[𝑥 − 𝑤𝑗 (𝑜𝑙𝑑)
The updated weights on the winner unit are
𝑤11 (𝑛𝑒𝑤) = 𝑤11 (𝑜𝑙𝑑) + 𝛼[𝑥1 − 𝑤11 (𝑜𝑙𝑑)
= 0.2 + 0.25(0.25 − 0.2) = 0.2125
𝑤21 (𝑛𝑒𝑤) = 𝑤21 (𝑜𝑙𝑑) + 𝛼[𝑥2 − 𝑤21 (𝑜𝑙𝑑)
= 0.2 + 0.25(0.25 − 0.2) = 0.2125
SOLUTION CONTD…
• The fourth unit is the winner unit that is closest to the input
vector
• Since 𝑡 ≠ 𝐽, the weight updation formula to be used is:
• 𝑤𝐽 (𝑛𝑒𝑤) = 𝑤𝐽 (𝑜𝑙𝑑) − 𝛼[𝑥 − 𝑤𝐽 (𝑜𝑙𝑑)
• Updating the weights on the winner unit, we obtain
resonant state.
Key Innovation
The key innovation of ART is the use of “expectations.”
– As each input is presented to the network, it is compared
with the prototype vector that is most closely matches
(the expectation).
– If the match between the prototype and the input vector is
NOT satisfactory, a new prototype is selected. In this way,
previous learned memories (prototypes) are not destroyed
by new learning.
ART NETWORK
There are two versions of it: ART1 and ART2
– ART1 was developed for clustering binary vectors
– ART2 was developed to accept continuous valued vectors
• For each pattern presented to the network, an appropriate cluster
unit is chosen and the weights of the cluster unit are adjusted to let
the cluster unit to learn the pattern
• The network controls the degree of similarity of the patterns placed
on the same cluster units
• During training:
• Each training pattern may be presented several times
• The input patterns should not be presented on the same cluster unit
when it is presented each time
Stability-Plasticity Dilemma (SPD)
layer to F2 layer and are represented by 𝑏𝑖𝑗 (ith F1 unit to the jth
F2 unit)
• The top-down weights are used for the connections from F2 layer
to F1(b) layer and are represented by 𝑡𝑗𝑖 (jth F2 unit to ith F1 unit)
• The competitive layer in this case is the cluster layer and the
cluster unit with largest net input is the victim to learn input
pattern
• The interface units combine the data from input and cluster
layer units
• On the basis of similarity between the top-down weight
vector and input vector, the cluster unit may be allowed to
learn the input pattern
• The decision is done by reset mechanism unit on the basis of
the signals it receives from interface portion and input portion
of the F1 layer
• When cluster unit is not allowed to learn, it is inhibited an a
new cluster unit is selected as the victim
FUNDAMENTAL OPERATING PRINCIPLE
• The concept of deep learning is not new. It has been around for a couple
of years now. It’s on hype nowadays because earlier we did not have
• Similarly, h(1) from the next is the input with X(2) for the next step and
so on.
W is weight, h is the single hidden vector, Whh is the weight at previous hidden
state, Whx is the weight at current input state, tanh is the Activation funtion,
that implements a Non-linearity that squashes the activations to the range[-1,1]
Output:
• Chain rule, if any one of the gradients approached 0, all the gradients would
rush to zero exponentially fast due to the multiplication. Such states would no
longer help the network to learn anything. This is known as the gradient
vanishing problem.
• Other RNN architectures like the LSTM(Long Short term memory) and the
GRU(Gated Recurrent Units) which can be used to deal with the vanishing
gradient problem.