Self-Organizing Map (SOM)
Self-Organizing Map (SOM)
Self-Organizing Map (SOM)
The brain diagram presents the map of the cerebral cortex. The
different areas of the cortex are identified by the thickness of
their layer and the types of neurons within them. Some of the
most important specific areas are as follows:
1
The essential point of the discovery in the neurobiology lies in
the principle of topographic map formation. This principle can
be stated as follows:
The spatial location of an output neuron in the topographic map
corresponds to a particular domain or feature of the input data.
2
The first type of mapping model has two separate two-
dimensional lattice, with one projecting onto the other. The first
lattice represents pre-synaptic (input) neurons, and the other
lattice represents post-synaptic (output) neurons. This model tries
to explain details of neurobiology.
3
Two-dimensional lattice of neurons
7
4
Competitive process
Assume an input pattern selected randomly from the input space
denoted by:
x [ x 1 , x 2 , , x n ] T
The weight vector of each neuron in the network has the same
dimension as the input space. Let the weight vector of neuron j
be denoted by:
w j [ w j1 , w j2 , , w jn ]
Cooperation process
The winning neuron locates the center of a topological
neighbor-hood of cooperating neurons. The key question s: how
we define a topological neighborhood that is neurobiological
correct? Research in neurobiology reveals that when a neuron is
10
5
excited, its immediate neighborhood tends to be excited more
than those that are far away from it. This observation leads us
to make the topological neighborhood around the winning
neuron i decay smoothly with the distance. Let hji denote the
topological neighbor-hood centered on winning neuron i,
neuron j is
i one off the
h set off excited
i d (cooperating)
( i ) neurons. The
Th
distance between winning neuron i and excited neuron j is
denoted by dji. Then we may assume that the neighborhood hji
is an unimodal function of the lateral distance dji such that it
satisfies two distinct requirements:
((1)) The topological
p g neighborhood
g hji is symmetric
y about the
maximum point defined by dji. In other words, it attains its
maximum value at the winning neuron.
(2) The amplitude of the topological neighborhood hj decreases
monotonically with increasing lateral distance dji, decaying
to zero when dji. 11
12
6
Parameter is the width of the Gaussian function, it determines
the degree to which excited neurons in the vicinity of the
winning neuron participate in the learning process.
7
d 2ji
h ji ( n ) exp
2 2 (n )
1.0
d
-K 0 K
15
16
8
17
Adaptive process
In the weight adaptive process, the weight vector wj of neuron j
in the network is required to change in relation to the input
vector x. The question is how to make the change. In the SOM
algorithm, the change to the weight vector of neuron j, which is
i id the
inside h topological
l i l neighborhood
i hb h d off winning i i neuron i,
i is
i as
follow:
w j (n) (n)h ji( x ) [x w j (n)]
The above equation has the effect of moving the weight vector
of winning neuron toward the input vector x. Upon repeated
presentation of the training data, the weight vectors tend to
follow the distribution of the input vectors. 18
9
The learning-rate parameter (n) should be time-varying. It
should start at an initial value and then decrease gradually with
the increase of iteration number n as shown below:
n
( n ) 0 exp
2
Where 0 is the initial value, and 2 is another time constant.
(1) Self-organizing
It is during this first phase of the adaptive process that
topological ordering of the weight vectors takes place. The
ordering phase may take as many as 1000 iterations, or even
more of the SOM algorithm.
more, algorithm In this phase,
phase learning rate
parameter and neighbor-hood functions must be carefully
selected.
The learning rate parameter (n) should begin with a value close
to 0.1; and the value decreases gradually, but remain above 0.01.
These desirable values are satisfied by the setting 0=0.1 and
1=1000.
1000 Thus,
Th the th varying
i learning
l i rate
t formula
f l becomes:
b
n
( n ) 0 . 1 exp
1000
The neighborhood function hji(n) should initially include almost
20
10
all neurons in the network centered on the winning neuron i, and
then shrink slow with iterations. Specifically, during the
ordering phase that may occupy 1000 iterations or more, hji(n) is
permitted to reduce to a small value of only a couple of
neighboring neurons around the winning neurons, or even to the
winning
i i neuron itself.
i lf We
W may set the i off 0 to the
h size h radius
di off
the two-dimensional lattice, and set the time constant 1 as
follow:
1000
1
ln 0
(2) Convergence phase
This second phase of the adaptive process is needed to fine tune
the map and therefore provide an accurate statistical
quantification of the input space. As a general rule, the number
of iterations of the convergence phase must be at least 500 times
21
11
is completed. The algorithm is summarized as follows.
(1) Initialization. Choose random values for the initial weight
vectors wj(0) . The only restriction on the initialization is
that the initial values must be different from each other. And
it may beb desirable
d i bl to t keep
k th magnitude
the it d off the
th weights
i ht
small.
(5) Continuation
Continuation. Repeat steps 2
2-4
4 until no noticeable changes
in the map are observed.
Example of SOM
12
Example of SOM training
25
Example 1
Given 4 training samples:
x1 [1,1,0,0]T x 2 [0,0,0,1]T
x 3 [1,0,0,0]T x 4 [0,0,1,1]T
13
Step 2: randomly select a sample, say x1, we have:
2
d1 w1 (0) x1 (0.2 1)2 (0.6 1)2 (0.5 0)2 (0.9 0)2 1.86
2
d2 w2 (0) x1 0.98
So the winningg neuron is neuron 2,, i((x))=2;; update
p the weights
g
w 2 (1) w 2 (0) 0.6[x1 w 2 (0)] [0.92,0.76,0.28,0.12]T
w1 (1) w1 (0) [0.2,0.6,0.5,0.9]T
Step 3: randomly select another sample, say x2, we have:
2
d1 w1(1) x2 0.66
2
d2 w2 (1) x2 2.28
So the first neuron wins, i(x)=1; update the weights:
w1 (2) w1 (1) 0.3[x2 w1 (1)] [0.14,0.42,0.35,0.93]T
w2 (2) w2 (1) [0.92,0.76,0.28,0.12]T
27
function w=som(data,no_row,no_column)
% Generate the initial weight vectors
[no_sample, no_feature]=size(data);
no_cluster=no_row*no_column;
w=rand(no_cluster,no_feature);
% Define the initial values for the time constants and the learning rate;
eta0=0.1;
sigma0=0.707*sqrt((no_row-1)^2+(no_column-1)^2);
tau1=1000/log(sigma0);
14
%The convergence phase,
% We first define the learning rate, this rate is kept as a constant.
eta=0.01;
% For N=1:2000
% sampling
k=fix(rand(1)*no_sample)+1;
% compete
for i=1:no_cluster
x=w(i,:)-data(k,:);
dd(i,1)=x*x’;
end
% find the winner
[p,ind]=min(dd);% ind is the index of the winner
% update the weight of the winning neuron (winner)
w(ind,:)=w(ind,:)+eta*(data(k,:)-w(ind,:));
end
29
% sampling
k=fix(rand(1)*no_sample)+1; % k is the index of the selected sample
% compete
for i=1:no_cluster
x=w(i,:)-data(k,:)
d(i 1)=x*x’;
d(i,1)=x x;
end
end
30
15
Function D=distance_matrix(no_row,no_column);
31
Example 2
32
16
Initial weight vectors:
33
34
17
After the convergence phase (denoted by “o”)
35
18
Consider 10 digits, represented by 118 dot matrices, we build a
5 5 2-D SOM map.
37
38
19
The resulted SOM, with weights viewed as digits, is shown
below. Obviously, weights of adjacent neurons are similar.
39
40
20
Samples:
41
42
21
The weight vectors after convergence phase (denoted by o)
43
44
22
For a nonlinear mapping shown below, it is impossible to use a
straight line (linear equation) to describe the data. But the use of
a self-organizing map built on a one-dimensional lattice of
neurons is able to overcome this approximation problem.
45
23