Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Self-Organizing Map (SOM)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

5.

Self-Organizing Map (SOM)


The development of the self-organizing map is inspired by the
research in neurobiology. Consider the map of a human brain:

The map of a human brain 1

The brain diagram presents the map of the cerebral cortex. The
different areas of the cortex are identified by the thickness of
their layer and the types of neurons within them. Some of the
most important specific areas are as follows:

(1) Motor cortex:


(1). area 4,
4 area 6 and area 8

(2). Visual cortex: areas 1,2 and 3.

(3). Auditory cortex: areas 41 and 42.

The discovery in the brain map shows clearly that different


sensory inputs (motor, visual and auditory, etc) are mapped
onto corresponding areas of the cerebral cortex in an orderly
fashion. These cortical maps are not entirely genetically
predetermined. Rather, they are sketched in during the early
development of the nervous system.
2

1
The essential point of the discovery in the neurobiology lies in
the principle of topographic map formation. This principle can
be stated as follows:
The spatial location of an output neuron in the topographic map
corresponds to a particular domain or feature of the input data.

The output neurons are usually arranged in a one-dimensional


or two-dimensional lattice, a topology ensures that each neuron
has a set of neighbors.
The manner in which the input patterns are specified
determines the nature of the mapping model.
model In particular,
particular we
may identify two basic models, both were inspired by the
discovery that the model of a visual cortex could not be
entirely genetically predetermined, a self-organizing process
involving synaptic learning may be responsible for local
ordering of feature sensitive cortical cells. 3

Two models of the self-organizing mapping 4

2
The first type of mapping model has two separate two-
dimensional lattice, with one projecting onto the other. The first
lattice represents pre-synaptic (input) neurons, and the other
lattice represents post-synaptic (output) neurons. This model tries
to explain details of neurobiology.

The second type of mapping model has only one two-dimensional


lattice. Instead of explaining the details, this type of model tries
to capture the essential features of computational maps in the
brain and yet remain computational tractable. This model is
developed by Kohonen, and is often called Kohonen self-
organizing map.
map

SOM has received considerable attention since it was proposed in


1982. Now it has become a popular and power tool for feature
extraction in pattern recognition. This is due to the property of
SOM that SOM approximates the input space. 5

Self-Organizing Map (SOM)

The principle of Kohonen SOM is to transform an incoming


signal pattern of arbitrary dimension into a one-dimensional or
two-dimensional discrete map, and to perform this transform
adaptively in a topological ordered fashion.
fashion The essential
ingredients of the SOM neural network are as follows:
(1) A one- or two-dimensional lattice of neurons that computes
simple discriminant functions of inputs received from an
input of arbitrary dimension.
(2) A mechanism that compares these functions and selects the
neurons with
i h the
h largest
l di i i
discriminant f
function
i value.
l
(3) An adaptive process that enables the activated neurons to
increase their discriminant values.
(4) An interactive network that activates the selected neuron
and its neighbors.
6

3
Two-dimensional lattice of neurons
7

The algorithm responsible for the formation of the SOM


proceeds first by initializing the synaptic weights in the network
through assigning them small random values. This process is
called initialization. Once the network has been properly
initialized, there are three essential processes involved in the
f
formation
i (training)
( i i ) off the
h self-organizing
lf i i map:
(1) Competition. For each input pattern, the neurons in the
network compute their respective values of a discriminant
function. This discriminant function provides the basis for
competition among the neurons. The neuron with the largest
discriminant function is the winner of the competition.
(2) Cooperation. The winning neuron determines the spatial
location of a topological neighborhood of excited neurons.
(3) Weights adaptation. This last mechanism enables the excited
neuron to increase their individual values of discriminant
function through suitable adjustments to the weights.
8

4
Competitive process
Assume an input pattern selected randomly from the input space
denoted by:
x  [ x 1 , x 2 , , x n ] T

The weight vector of each neuron in the network has the same
dimension as the input space. Let the weight vector of neuron j
be denoted by:
w j  [ w j1 , w j2 , , w jn ]

To find the best match of the input


p vector x with the synaptic
y p
weight vector wj we simply compare the inner products wjx for
j=1,2,…,m, where m is the number of neurons, and select the
largest. In practice, we often find it is convenient to normalize
the weight vector to constant Euclidean norm. In such a case,
9

the maximum inner product is equivalent to minimum


Euclidean distance between vectors. If we use i(x) to identify
the neuron that best matches the input vector x, we may then
determine i(x) by using:
i(x)  arg min x  w j

Where ||*|| denotes the Euclidean norm of the argument vector.


i(x) is the subject of attention, and the corresponding neuron i is
called winning neuron for the input vector x.

Cooperation process
The winning neuron locates the center of a topological
neighbor-hood of cooperating neurons. The key question s: how
we define a topological neighborhood that is neurobiological
correct? Research in neurobiology reveals that when a neuron is
10

5
excited, its immediate neighborhood tends to be excited more
than those that are far away from it. This observation leads us
to make the topological neighborhood around the winning
neuron i decay smoothly with the distance. Let hji denote the
topological neighbor-hood centered on winning neuron i,
neuron j is
i one off the
h set off excited
i d (cooperating)
( i ) neurons. The
Th
distance between winning neuron i and excited neuron j is
denoted by dji. Then we may assume that the neighborhood hji
is an unimodal function of the lateral distance dji such that it
satisfies two distinct requirements:
((1)) The topological
p g neighborhood
g hji is symmetric
y about the
maximum point defined by dji. In other words, it attains its
maximum value at the winning neuron.
(2) The amplitude of the topological neighborhood hj decreases
monotonically with increasing lateral distance dji, decaying
to zero when dji. 11

A typical choice of hji that satisfies these requirements is the


Gaussian function:
 d 2ji 
h ji  exp   
 2 2 
 

12

6
Parameter  is the width of the Gaussian function, it determines
the degree to which excited neurons in the vicinity of the
winning neuron participate in the learning process.

For cooperation among neighboring neurons to hold, it is


necessary that the topological neighborhood hji be dependent on
distance between winning neuron i and excited neuron j in the
output space rather than the distance measure in the original
measurement space.
In the case of a one-dimensional space, dji is an integer defined
as:
d ji  j  i

In the case of a two-dimensional space, dji is defined as:


d ji  r j  r i
13

Where ri and rj denotes the discrete position of winning neuron i


and excited neuron j measured in the output space respectively.

Another unique feature of SOM is that the size of the


topological neighborhood shrinks with the learning process.
This can be done by making the width parameter  of the
topological neighborhood function hji decrease with the learning
process. A typical choice for the dependence of  on the
learning process is the exponential decay described by:
 n 
 ( n )   0 exp   
 1 

Where 0 is the value at the initiation of the SOM algorithm, and


1 is a time constant, n is the number of iterations. Thus, as n
increase, the width (n) decreases at an exponential rate, and the
topological neighborhood shrinks in a corresponding manner as
described below: 14

7
 d 2ji 
h ji ( n )  exp   
 2 2 (n ) 
 

Besides the Gaussian neighborhood function, hexagon and


square
q neighborhood
g are also used in SOM. Neurons inside the
defined neighborhood are excited. In such cases, the hji is in the
following form:
h

1.0

d
-K 0 K
15

16

8
17

Adaptive process
In the weight adaptive process, the weight vector wj of neuron j
in the network is required to change in relation to the input
vector x. The question is how to make the change. In the SOM
algorithm, the change to the weight vector of neuron j, which is
i id the
inside h topological
l i l neighborhood
i hb h d off winning i i neuron i,
i is
i as
follow:
w j (n)  (n)h ji( x ) [x  w j (n)]

Where  (n) is the learning rate at iteration n. After change, the


weight vector becomes:
w j (n  1)  w j (n)  (n)h ji( x ) [x  w j (n)]

The above equation has the effect of moving the weight vector
of winning neuron toward the input vector x. Upon repeated
presentation of the training data, the weight vectors tend to
follow the distribution of the input vectors. 18

9
The learning-rate parameter (n) should be time-varying. It
should start at an initial value and then decrease gradually with
the increase of iteration number n as shown below:
 n 
 ( n )   0 exp   
 2 
Where 0 is the initial value, and 2 is another time constant.

Two phases of the Adaptive process


Starting from an initial state of complete disorder, the SOM
algorithm
g graduallyy leads to an organized
g g representation
p of
activation patterns drawn from the input space.
The adaptation process of weight vectors in the network can be
decomposed into two phases: an ordering or self-organizing
phase followed by a convergence phase as described below:
19

(1) Self-organizing
It is during this first phase of the adaptive process that
topological ordering of the weight vectors takes place. The
ordering phase may take as many as 1000 iterations, or even
more of the SOM algorithm.
more, algorithm In this phase,
phase learning rate
parameter and neighbor-hood functions must be carefully
selected.
The learning rate parameter (n) should begin with a value close
to 0.1; and the value decreases gradually, but remain above 0.01.
These desirable values are satisfied by the setting 0=0.1 and
1=1000.
1000 Thus,
Th the th varying
i learning
l i rate
t formula
f l becomes:
b
 n 
 ( n )  0 . 1 exp   
 1000 
The neighborhood function hji(n) should initially include almost
20

10
all neurons in the network centered on the winning neuron i, and
then shrink slow with iterations. Specifically, during the
ordering phase that may occupy 1000 iterations or more, hji(n) is
permitted to reduce to a small value of only a couple of
neighboring neurons around the winning neurons, or even to the
winning
i i neuron itself.
i lf We
W may set the i off 0 to the
h size h radius
di off
the two-dimensional lattice, and set the time constant 1 as
follow:
1000
1 
ln  0
(2) Convergence phase
This second phase of the adaptive process is needed to fine tune
the map and therefore provide an accurate statistical
quantification of the input space. As a general rule, the number
of iterations of the convergence phase must be at least 500 times
21

the number of neurons in the network. Thus, the convergence


phase may have to go on for thousands and possibly tens of
thousands of iterations.

For good statistical accuracy, the learning rate parameter (n)


should be maintained during the convergence phase at a small
value on the order of 0.01, but should not be zero.
The neighborhood function hji(n) should contain only the
nearest neighbors of a winning neuron, which may eventually
reduce to one or zero neighboring neurons.

Summary of the SOM algorithm

There are four basic steps involved in the algorithm, namely,


initialization, sampling, similarity matching and updating. The
three steps after initialization are repeated until the map function
22

11
is completed. The algorithm is summarized as follows.
(1) Initialization. Choose random values for the initial weight
vectors wj(0) . The only restriction on the initialization is
that the initial values must be different from each other. And
it may beb desirable
d i bl to t keep
k th magnitude
the it d off the
th weights
i ht
small.

(2) Sampling. Draw a sample x from the input distribution with


a certain probability. Usually, the x is drawn from the given
training samples set.

(3) Similarity matching. Find the winning neuron i(x) at the


time k using the minimum-distance Euclidean criterion:
i ( x )  arg min x  w j
where j=1,2,…,N. 23

(4) Updating. Adjust the synaptic weight vectors of all neurons


using the updating formula:
w j (n  1)  w j (n)  (n)h ji( x ) [x  w j (n)]

(5) Continuation
Continuation. Repeat steps 2
2-4
4 until no noticeable changes
in the map are observed.

Example of SOM

Consider a computer simulation of a SOM arranged in a two-


dimensional lattice with 10 rows and 10 columns. The network
is trained with a two-dimensional input vector x, whose
elements x1 and x2 are uniformly distributed in the region:
 1  x1  1
 1  x 2  1
24

12
Example of SOM training
25

Example 1
Given 4 training samples:
x1  [1,1,0,0]T x 2  [0,0,0,1]T
x 3  [1,0,0,0]T x 4  [0,0,1,1]T

We wish to find 2 clusters of the training samples. Suppose the


learning rate is
( 0 )  0 .6
 ( n  1)  0 .5 ( n )

The neighborhood function is set so that only the winning


neuron is updated with its weights at each step.

Step 1: initialization. Initialize two weight vectors:


w 1 (0)  [0.2,0.6,0.5,0.9]T w 2 (0)  [0.8,0.4,0.7,0.3]T
26

13
Step 2: randomly select a sample, say x1, we have:
2
d1  w1 (0)  x1  (0.2 1)2  (0.6 1)2  (0.5  0)2  (0.9  0)2  1.86
2
d2  w2 (0)  x1  0.98
So the winningg neuron is neuron 2,, i((x))=2;; update
p the weights
g
w 2 (1)  w 2 (0)  0.6[x1  w 2 (0)]  [0.92,0.76,0.28,0.12]T
w1 (1)  w1 (0)  [0.2,0.6,0.5,0.9]T
Step 3: randomly select another sample, say x2, we have:
2
d1  w1(1)  x2  0.66
2
d2  w2 (1)  x2  2.28
So the first neuron wins, i(x)=1; update the weights:
w1 (2)  w1 (1)  0.3[x2  w1 (1)]  [0.14,0.42,0.35,0.93]T
w2 (2)  w2 (1)  [0.92,0.76,0.28,0.12]T
27

function w=som(data,no_row,no_column)
% Generate the initial weight vectors
[no_sample, no_feature]=size(data);
no_cluster=no_row*no_column;
w=rand(no_cluster,no_feature);
% Define the initial values for the time constants and the learning rate;
eta0=0.1;
sigma0=0.707*sqrt((no_row-1)^2+(no_column-1)^2);
tau1=1000/log(sigma0);

% Generate distance matrix


D=distance_matrix(no_row,no_column);
% D(i,j) denotes the squared distance from neuron i to neuron j

% The self-organizing phase, we perform 1000 iterations


For N=1:1000
% calculate the learning rate at the current iteration
eta=eta0*exp(-N/1000);
% calculate the width of the neighborhood function
sigma=sigma0*exp(-N/tau1); 28

14
%The convergence phase,
% We first define the learning rate, this rate is kept as a constant.
eta=0.01;

%We perform 2000 iterations

% For N=1:2000
% sampling
k=fix(rand(1)*no_sample)+1;
% compete
for i=1:no_cluster
x=w(i,:)-data(k,:);
dd(i,1)=x*x’;
end
% find the winner
[p,ind]=min(dd);% ind is the index of the winner
% update the weight of the winning neuron (winner)
w(ind,:)=w(ind,:)+eta*(data(k,:)-w(ind,:));
end
29

% sampling
k=fix(rand(1)*no_sample)+1; % k is the index of the selected sample

% compete
for i=1:no_cluster
x=w(i,:)-data(k,:)
d(i 1)=x*x’;
d(i,1)=x x;
end

% find the winner


[p,ind]=min(d); % where ind is the index of the winner

% weight updating for all neurons


for i=1:no_cluster
h=exp(-D(ind,i)/2/sigma^2);
w(i,:)=w(i,:)+eta*h*(data(k,:)-w(i,:))
end

end
30

15
Function D=distance_matrix(no_row,no_column);

% we first define the position of each neuron


s=0;
for i=1:no_row
for j=1:no_column
s=s+1; % index of a neuron
p(s,1)=i;
p(s,2)=j;
end
end

% calculate the squared distance between two neurons


for i=1:no
i=1:no_cluster
cluster
for j=1:no_cluster
D(i,j)=(p(i,1)-p(j,1))^2+(p(i,2)-p(j,2))^2;
end

31

Example 2

32

16
Initial weight vectors:

33

After the self-organizing phase (denoted by )

34

17
After the convergence phase (denoted by “o”)

35

Properties of the Feature Map


Once the SOM algorithm is converged, the feature map
produced by the algorithm displays important statistical
characteristics of the input space.
(1) Approximation of the input space.
space The feature map
represented by a set of weight vectors in the output space
provide a good approximation to the input space. The basic
aim is to store a large set of input vectors by finding a
smaller set of prototypes.
(2) Topological ordering. The feature map computed by the
SOM algorithm
l ith is i topologically
t l i ll ordered
d d in i the
th sense that
th t the
th
spatial location of a neuron in the lattice corresponds to a
particular domain of the input patterns. This property is a
direct consequence of the updated equation that forces
weight vector wi of the winning neuron to move toward x.
36

18
Consider 10 digits, represented by 118 dot matrices, we build a
5 5 2-D SOM map.

37

Initial weights are generated randomly and are shown below:

38

19
The resulted SOM, with weights viewed as digits, is shown
below. Obviously, weights of adjacent neurons are similar.

39

(3) Density Matching. The map reflects variations in the


statistics of the input distribution: regions in the input space
from which sample are drown with a high probability of
occurrence are mapped onto larger domains of the output
space and therefore with better resolution than regions in
space,
the input space from which samples are drawn with a low
probability of occurrence.

Consider the example shown below, where 20 samples (1/6) are


from cluster 1, and 100 samples (5/6) are from cluster 2.

After training, 1 of the 6 neurons corresponds to cluster 1, and 5


of the 6 neurons correspond to cluster 2.

40

20
Samples:

41

Initial weight vectors (denoted by □)

42

21
The weight vectors after convergence phase (denoted by o)

43

(4) Given data from an input space with nonlinear distribution,


the self-organizing map is able to select a subset of best
features for approximating the underlying distributions.

For a linear mapping between u and x as shown below, we can


use a straight
t i ht line
li to
t describe
d ib the
th mapping.
i

44

22
For a nonlinear mapping shown below, it is impossible to use a
straight line (linear equation) to describe the data. But the use of
a self-organizing map built on a one-dimensional lattice of
neurons is able to overcome this approximation problem.

45

23

You might also like