Associative Self-organizing Map

David Gil; Germund Hesslow; Christian Balkenius; Magnus  Johnsson

Associative Self-organizing Map

2009

By Germund Hesslow, Magnus Johnsson, Christian Balkenius, and David Gil

0 Associative Self-Organizing Map Magnus Johnsson 1 , Max Martinsson 1 , David Gil 2 and Germund Hesslow 3 1 Lund University Cognitive Science 2 Computing Technology and Data Processing, University of Alicante 3 Department of Experimental Medical Science, Lund 1,3 Sweden 2 Spain 1. Introduction There is not enough genetic information to specify the connectivity of the brain in detail (Miikkulainen et al., 2005). The total number of neurons in the neocortex of an adult human brain is estimated to be about 28 billion and the number of connections (synapses) between them and to other cells in the brain to more than 100 trillion (Mountcastle, 1997). In comparison, the human genome only contains about 3 billion base pairs (Consortium, 2004). Thus a reasonable view is that the cortex is not directly speciﬁed genetically but constructed by input-driven self-organization (Miikkulainen et al., 2005). The self-organizing process works by using sensory input to adjust the networks organization instead of specifying all connections in advance. The form of self-organization that seems to be active in the cortex gives rise to a special property called topological mapping. This means that the neurons that are activated for similar sensory inputs are found close to each other. The properties self-organization and topology preservation are caught in the Self-Organizing Map (SOM) (Kohonen, 1988), which shares many features with brain maps (Kohonen, 1990). However, the cortex consists of many brain maps and different parts of the cortex obviously interact. For example, different sensory modalities interact with each other. A dramatic illustration of this can be seen in the McGurk-MacDonald effect. If you hear a person making the sound /ba/ but the sound is superimposed on a video recording on which you do not see the lips closing, you may hear the sound /da/ instead (McGurk & MacDonald, 1976). The neural mechanisms underlying such interaction between different sensory modalities are not known but recent evidence suggests that different primary sensory cortical areas can inﬂuence each other. Interaction between sensory modalities may also be important for internal simulation of perceptions. An idea that has been gaining popularity in cognitive science in recent years is that higher organisms are capable of simulating perception. In essence, this means that the perceptual processes normally elicited by some ancillary input can be mimicked by the brain (Hesslow, 2002). There is now a large body of evidence supporting this contention. For instance, several neuroimaging experiments have demonstrated that activity in visual cortex when a subject imagines a visual stimulus resembles the activity elicited by a corresponding ancillary stimulus (for a review of this evidence see e.g. (Kosslyn et al., 2001); for a somewhat different interpretation, see (Bartolomeo, 2002). 30

2 Self Organising Maps, New Achievements A critical question here is how simulated perceptual activity might be elicited. One possibility is that signals arising in the frontal lobe in anticipation of consequences of incipient actions are sent back to sensory areas (Hesslow, 2002). Another possibility is that perceptual activity in one sensory area can inﬂuence activity in another. Inspired by these ﬁndings we suggest that in a multimodal perceptual model, the subsystems of different sensory modalities should co-develop and be associated with each other. This means that suitable activity in some modalities that for the moment receive input should, at least to some degree, elicit appropriate activity in other sensory modalities as well. This provides an ability to activate the subsystem for a modality even when its sensory input is limited or nonexistent as long as there is activity in subsystems for other modalities. Another probable ability of the brain is to elicit continued and reasonable activity in different perceptual subsystems in the absence of input, i.e. an ability to internally simulate sequences of perceptions as proposed in the neuroscientiﬁc simulation hypothesis (Hesslow, 2002). This means an ability to elicit activity patterns that are normally subsequent to the present activity pattern in a subsystem even when sensory input is absent. It would also imply an ability to anticipate future sequences of perceptions that normally follows a certain perception within a modality, but also over different modalities if the modalities have co-developed and are associated. For example, a gun seen to be ﬁred from a long distance, would yield an anticipation of a bang to follow soon. This chapter presents a novel variant of the SOM. This variant of the SOM is called the Associative Self-Organizing Map (A-SOM) and we think it would be suitable in models that catch phenomena like those sketched above. The A-SOM is similar to the SOM and develops a representation of its input space, but in addition it also learns to associate its activity with the (possibly time delayed) activities of an arbitrary number of other neural networks, or its own earlier activity (which makes it into an unsupervised recurrent neural network). The A-SOM differs from earlier attempts to build associated maps such as the Adaptive Resonance Associative Map (Tan, 1995) and Fusion ART (Nguyen et al., 2008) in that all layers (or individual networks) share the same structure and uses topologically arranged representations. Unlike ARTMAP, the A-SOM also allows associations to be formed in both directions (Carpenter et al., 1992). The most similar existing unsupervised recurrent neural network is the Recursive SOM that feeds back its activity together with the input for the next iteration (Voegtlin, 2002). The Recursive SOM is similar but not equivalent to the A-SOM and lacks ability to associate with the activity of other neural networks. Other less similar examples are the Temporal Kohonen Map (Chappell & Taylor, 1993), the Recurrent Self-Organizing Map (Varsta et al., 1997) and the Merge SOM (Strickert & Hammer, 2005). The chapter both summarizes our previous work (Johnsson & Balkenius, 2008; Johnsson et al., 2009a;b) with the A-SOM and adds new results and insights. It describes the A-SOM in detail and its use in the modelling of cross-modal expectations and in the modelling of internal simulation. 2. Associative self-organizing map The A-SOM is based on the ordinary SOM and thus ﬁnds a representation of its input space. In addition it also learns to associate its activity with (possibly delayed) additional ancillary inputs. These ancillary inputs could be the activities of a number of external SOMs or A-SOMs, or the earlier activity of the A-SOM itself. It consists of a grid of neurons with 604 Self Organizing Maps - Applications and Novel Algorithm Design

0 30 Associative Self-Organizing Map Magnus Johnsson1, Max Martinsson1 , David Gil2 and Germund Hesslow3 2 Computing 1 Lund University Cognitive Science Technology and Data Processing, University of Alicante 3 Department of Experimental Medical Science, Lund 1,3 Sweden 2 Spain 1. Introduction There is not enough genetic information to specify the connectivity of the brain in detail (Miikkulainen et al., 2005). The total number of neurons in the neocortex of an adult human brain is estimated to be about 28 billion and the number of connections (synapses) between them and to other cells in the brain to more than 100 trillion (Mountcastle, 1997). In comparison, the human genome only contains about 3 billion base pairs (Consortium, 2004). Thus a reasonable view is that the cortex is not directly specified genetically but constructed by input-driven self-organization (Miikkulainen et al., 2005). The self-organizing process works by using sensory input to adjust the networks organization instead of specifying all connections in advance. The form of self-organization that seems to be active in the cortex gives rise to a special property called topological mapping. This means that the neurons that are activated for similar sensory inputs are found close to each other. The properties self-organization and topology preservation are caught in the Self-Organizing Map (SOM) (Kohonen, 1988), which shares many features with brain maps (Kohonen, 1990). However, the cortex consists of many brain maps and different parts of the cortex obviously interact. For example, different sensory modalities interact with each other. A dramatic illustration of this can be seen in the McGurk-MacDonald effect. If you hear a person making the sound /ba/ but the sound is superimposed on a video recording on which you do not see the lips closing, you may hear the sound /da/ instead (McGurk & MacDonald, 1976). The neural mechanisms underlying such interaction between different sensory modalities are not known but recent evidence suggests that different primary sensory cortical areas can influence each other. Interaction between sensory modalities may also be important for internal simulation of perceptions. An idea that has been gaining popularity in cognitive science in recent years is that higher organisms are capable of simulating perception. In essence, this means that the perceptual processes normally elicited by some ancillary input can be mimicked by the brain (Hesslow, 2002). There is now a large body of evidence supporting this contention. For instance, several neuroimaging experiments have demonstrated that activity in visual cortex when a subject imagines a visual stimulus resembles the activity elicited by a corresponding ancillary stimulus (for a review of this evidence see e.g. (Kosslyn et al., 2001); for a somewhat different interpretation, see (Bartolomeo, 2002). 2 604 Self- Organising NewAlgorithm Achievements Self Organizing Maps ApplicationsMaps, and Novel Design A critical question here is how simulated perceptual activity might be elicited. One possibility is that signals arising in the frontal lobe in anticipation of consequences of incipient actions are sent back to sensory areas (Hesslow, 2002). Another possibility is that perceptual activity in one sensory area can influence activity in another. Inspired by these findings we suggest that in a multimodal perceptual model, the subsystems of different sensory modalities should co-develop and be associated with each other. This means that suitable activity in some modalities that for the moment receive input should, at least to some degree, elicit appropriate activity in other sensory modalities as well. This provides an ability to activate the subsystem for a modality even when its sensory input is limited or nonexistent as long as there is activity in subsystems for other modalities. Another probable ability of the brain is to elicit continued and reasonable activity in different perceptual subsystems in the absence of input, i.e. an ability to internally simulate sequences of perceptions as proposed in the neuroscientific simulation hypothesis (Hesslow, 2002). This means an ability to elicit activity patterns that are normally subsequent to the present activity pattern in a subsystem even when sensory input is absent. It would also imply an ability to anticipate future sequences of perceptions that normally follows a certain perception within a modality, but also over different modalities if the modalities have co-developed and are associated. For example, a gun seen to be fired from a long distance, would yield an anticipation of a bang to follow soon. This chapter presents a novel variant of the SOM. This variant of the SOM is called the Associative Self-Organizing Map (A-SOM) and we think it would be suitable in models that catch phenomena like those sketched above. The A-SOM is similar to the SOM and develops a representation of its input space, but in addition it also learns to associate its activity with the (possibly time delayed) activities of an arbitrary number of other neural networks, or its own earlier activity (which makes it into an unsupervised recurrent neural network). The A-SOM differs from earlier attempts to build associated maps such as the Adaptive Resonance Associative Map (Tan, 1995) and Fusion ART (Nguyen et al., 2008) in that all layers (or individual networks) share the same structure and uses topologically arranged representations. Unlike ARTMAP, the A-SOM also allows associations to be formed in both directions (Carpenter et al., 1992). The most similar existing unsupervised recurrent neural network is the Recursive SOM that feeds back its activity together with the input for the next iteration (Voegtlin, 2002). The Recursive SOM is similar but not equivalent to the A-SOM and lacks ability to associate with the activity of other neural networks. Other less similar examples are the Temporal Kohonen Map (Chappell & Taylor, 1993), the Recurrent Self-Organizing Map (Varsta et al., 1997) and the Merge SOM (Strickert & Hammer, 2005). The chapter both summarizes our previous work (Johnsson & Balkenius, 2008; Johnsson et al., 2009a;b) with the A-SOM and adds new results and insights. It describes the A-SOM in detail and its use in the modelling of cross-modal expectations and in the modelling of internal simulation. 2. Associative self-organizing map The A-SOM is based on the ordinary SOM and thus finds a representation of its input space. In addition it also learns to associate its activity with (possibly delayed) additional ancillary inputs. These ancillary inputs could be the activities of a number of external SOMs or A-SOMs, or the earlier activity of the A-SOM itself. It consists of a grid of neurons with 3 605 AssociativeSelf-Organizing Self-Organizing Map Associative Map a fixed number of neurons. Each neuron has multiple sets of weights, one for main input (which is similar to the input of an ordinary SOM) and one for each ancillary input. All neurons receive both main input (e.g. from a sensor), and ancillary inputs (e.g. the activity in associated representations of other sensory modalities or the A-SOMs activity from previous iterations). Each neuron calculates activities for its main input and for each ancillary input. The main input activity is calculated in a way similar to the ordinary SOM, with dot product as the similarity measure. Also the adaptation of the weights corresponding to the main input are calculated as in an ordinary SOM, i.e. so that the neuron with the highest main activity and the neurons in its vicinity are adjusted. The ancillary activities of a neuron are calculated using dot product and are adjusted by the delta rule to approach the main activity. The total activity of a neuron is calculated by averaging the main activity and the ancillary activities. By connecting the total activity of the A-SOM back to itself as an ancillary input with a time delay the A-SOM is turned into a recurrent A-SOM able to learn sequences. This is so because then the ancillary weights will have learned to evoke activity based on the previous activity in the A-SOM. Formally the A-SOM consists of an I × J grid of neurons with a fixed number of neurons and a fixed topology. Each neuron n ij is associated with r + 1 weight vectors wija ∈ Rn and w1ij ∈ Rm1 , w2ij ∈ Rm2 , . . . , wrij ∈ Rmr . All the elements of all the weight vectors are initialized by real numbers randomly selected from a uniform distribution between 0 and 1, after which all the weight vectors are normalized, i.e. turned into unit vectors. At time t each neuron n ij receives r + 1 input vectors x a (t) ∈ Rn and x1 (t − d1 ) ∈ Rm1 , x2 (t − d2 ) ∈ Rm2 , . . . , xr (t − dr ) ∈ Rmr where d p is the time delay for input vector x p , p = 1, 2, . . . , r. The main net input sij is calculated using the standard cosine metric sij (t) = x a (t) · wija (t) || x a (t)|||| wija (t)|| , (1) The activity in the neuron n ij is given by yij (t) = yija (t) + y1ij (t) + y2ij (t) + . . . + yrij (t) /(r + 1) where the main activity yija is calculated by using the softmax function Bishop (1995) m sij (t) a yij (t) = m maxij sij (t) (2) (3) where m is the softmax exponent. p p = 1, 2, . . . , r is calculated by again using the standard cosine The ancillary activity yij (t), metric p p yij (t) = x p (t − d p ) · wij (t) p || x p (t − d p )|||| wij (t)|| The neuron c with the strongest main activation is selected: . (4) 4 606 The weights Self- Organising NewAlgorithm Achievements Self Organizing Maps ApplicationsMaps, and Novel Design a wijk c = arg maxij yija (t) (5) a a a (t + 1) = wijk (t) + α(t) Gijc (t) xka (t) − wijk (t) wijk (6) are adapted by where 0 ≤ α(t) ≤ 1 is the adaptation strength with α(t) → 0 when t → ∞. The neighbourhood − || r c− r ij || function Gijc (t) = e 2σ2 (t) , where rc ∈ R2 and rij ∈ R2 are location vectors of neurons c and n ij , is a Gaussian function decreasing with time. p The weights wijl , p = 1, 2, . . . , r, are adapted by p p p p wijl (t + 1) = wijl (t) + βxl (t − d p ) yija (t) − yij (t) (7) where β is the adaptation strength. a ( t ) and w p ( t ) are normalized after each adaptation. All weights wijk ijl 3. Modelling cross-modal expectations 3.1 Associating the A-SOM with two ancillary SOMs We have tested the A-SOM in a model of cross-modal expectations (Johnsson et al., 2009a). In this experiment we connected an A-SOM to two ancillary SOMs and trained all three neural networks with a set of 10 samples, Fig. 1. This set was constructed by randomly generating 10 points with a uniform distribution from a subset s = {( x, y) ∈ R2 ; 0 ≤ x ≤ 1, 0 ≤ y ≤ 1} of the plane, Fig. 2, left. The selected points were then mapped to a subset of R3 by adding a third constant element of 0.5, yielding a training set of three-dimensional vectors. The reason for this was that a Voronoi tessellation of the plane was calculated from the generated points to later aid in the determination of were new points in the plane were expected to invoke activity in the A-SOM. To make this Voronoi tessellation, which is based on a Euclidian metric, useful for this purpose with the A-SOM, which uses a metric based on dot product, the set of points in the plane has to be mapped so that the corresponding position vectors after normalization are unique. One way to accomplish such a mapping is by adding a constant element to each vector. The result of this is that each vector will have a unique angle in R3 . We chose the value 0.5 for the constant elements to maximize the variance of the angles in R3 . The A-SOM was connected to two SOMs (using the same kind of activation as the main activation in the A-SOM, i.e. dot product with softmax activation) called SOM 1 and SOM 2, and thus also received their respective activities as associative input, see Fig. 1. The A-SOM, SOM 1 and SOM 2 were then simultaneously fed with samples from the training set, during a training phase consisting of 20000 iterations. The two SOMs and the A-SOM could as well be fed by samples from three different sets, always receiving the same combinations of samples from the three sets (otherwise the system could not learn to associate them). This could be seen as a way of simulating simultaneous input from three different sensory modalities when an animal or a robot explores a particular object. Each of the three representations, the A-SOM and the two SOMs, consists of 15 × 15 neurons. The softmax exponent for each of them were set to 1000. Their learning rate α(0) was initialized to 0.1 with a learning rate decay of 0.9999 (i.e. multiplication of the learning rate with 0.9999 in each iteration), which means the minimum learning rate, set to 0.01, will be reached at the end of the 20000 training iterations. The neighbourhood radius, i.e. σ of the neighbourhood function Gijc (t) in eq. (6), 5 607 AssociativeSelf-Organizing Self-Organizing Map Associative Map Fig. 1. Schematic depiction over the connections between the two SOMs and the A-SOM in the model of cross-modal expectations. The test system consists of three subsystems, which develop representations of sample sets from three input spaces (for simplicity we use the same input set for all three representations in this study). One of the representations (the A-SOM) also learns to associate its activity with the simultaneous activities of the two SOMs. This means proper activity can be invoked in the A-SOM of the fully trained system even if it does not receive any ordinary input. This is similar to cross-modal activation in humans, e.g. a tactile perception of an object that invokes an internal visual imagination of the same object. was initialized to 15 for all three representations and shrunk to 1 during the 20000 training iterations by using a neighbourhood decay of 0.9998 (i.e. multiplication of the neighbourhood radius with 0.9998 in each iteration). All three representations used plane topology when calculating the neighbourhood. The β for the associative weights in the A-SOM was set to 0.35. After training the system was evaluated by feeding it with samples from the training set again to one, two or all three representations in all possible combinations. When a representation did not receive any input it was fed with null vectors instead (thus simulating the input of 8 7 5 10 3 1 2 2 6 1 9 3 9 6 4 4 8 10 5 7 Fig. 2. Left: The Voronoi tessellation of the points used when constructing the training set used for the A-SOM and the two SOMs. This set was constructed by randomly generating 10 points from a subset of R2 according to a uniform distribution. To make this Voronoi tessellation, which is based on a Euclidian metric, valid as a measure of proximity the training set had to be transformed by addition of a constant element to each sample vector. This is because the A-SOM using a dot product based metric and normalizing its input would consider all position vectors of a particular angle equal. By adding a constant element each point in the plane becomes a position vector in R3 with a unique angle. Right: The same Voronoi tesselation but with the points used in the generalization test depicted. Also this set was mapped to a new set in R3 by addition of a third constant element to each sample vector, and for the same reason as for the samples in the training set. 6 608 Self- Organising NewAlgorithm Achievements Self Organizing Maps ApplicationsMaps, and Novel Design no signal from sensors of the modality of that representation). The centers of activity in the A-SOM as well as in the two SOMs were recorded for all these tests. The result was evaluated by using the training set on the fully trained system. First we recorded the centers of activation in the A-SOM when fed by main input from the training set only (i.e. the two SOMs were fed with null vectors) and the centers of activation in the two SOMs. Then we calculated Voronoi tessellations for the centers of activation in all three representations (Fig. 3, uppermost row) to see if they could separate the samples and in particular if the A-SOM could separate the samples when fed by the activity of one or both of the SOMs only. If the center of activation for a particular sample in the training set were located in the correct Voronoi cell, this is considered as a successful recognition of the sample, because this means the center of activation is closer to the center of activation of the same object than to the center of activation of any other sample in the training set when the A-SOM is fed by main input only like an ordinary SOM. By comparing the Voronoi tessellations of the A-SOM and the two SOMs, Fig. 3, and the Voronoi tessellation of the plane for the training set, Fig. 2, we can see that the ordering of the Voronoi cells for the training set are to a large extent preserved for the Voronoi cells for the centers of activation in the A-SOM and the two SOMs. In Fig. 3 we can also see that all, i.e. 100% of the training samples are recognized in the A-SOM as long as at least one of the three representations receives input. 3.1.1 Generalization To test if the system was able to generalize to a new set of samples, which it had not been trained with, we constructed another set of 10 samples with the same method as for the training set. This generalization test set was used as input to the two SOMs and the A-SOM, i.e. each of these representations received the same sample simultaneously (or a null vector). The generalization ability of the system was evaluated by feeding it with samples from the generalization set to one, two or all three representations in all possible combinations. When a representation did not receive any input it was fed with null vectors instead. The centers of activity in the A-SOM as well as in the two SOMs were recorded for all these tests. The result was evaluated by now using the generalization set on the fully trained system. We recorded the centers of activation in the A-SOM when each of the SOMs were the only recipient of input, when both SOMs received input, when each of the SOMs and the A-SOM received input, when all three representations received input, and when only the A-SOM received input. As before a representation which did not receive input received null vectors (signifying the lack of sensory registration for that modality). We then looked at in which Voronoi cell the centre of activation was located in the A-SOM and in the SOMs for each sample of the generalization set. When a generalization sample belongs to the Voronoi cell for sample k = 1, 2, . . . , 10 of the training set, see Fig. 2, and its activation in the A-SOM or one of the SOMs is located in the Voronoi cell for the centre of activation for the same training sample, see Fig. 3, then we consider the centre of activation for the generalization sample to be properly located and we consider it to be successfully generalized. Leftmost in the upper row of Fig. 3 we can see that the centers of activation for all the generalization samples besides sample 8 is within the correct Voronoi cell in the A-SOM when it receives main input only. However that sample 8 is outside, and barely so, the correct Voronoi cell is probably not an indication that it is incorrect because the A-SOM consists of 225 neurons and is not a continuous surface but a discretized representation. In the middle of the upper row of Fig. 3 we can see that all centers of activation for the generalization samples are correctly located in SOM1 besides 1 and 6 which are on the 7 609 AssociativeSelf-Organizing Self-Organizing Map Associative Map 7 8 5 8 7 4 10 4 5 7 5 10 1 4 6 6 2 9 3 3 9 6 9 1 9 6 6 9 9 3 1 2 1 2 1 6 1 7 7 10 2 8 5 8 5 4 7 8 8 2 7 5 8 5 10 4 7 5 7 5 4 8 10 4 10 4 6 4 10 3 3 10 2 8 3 4 10 4 6 4 6 6 9 9 6 3 9 3 9 2 2 1 10 3 7 1 3 8 4 7 1 5 4 7 2 8 5 10 5 8 4 8 2 2 4 6 3 7 2 8 5 5 10 1 7 2 8 5 2 1 1 7 10 9 9 9 6 3 4 6 3 2 4 10 9 9 9 9 6 9 6 10 3 3 6 6 10 10 1 10 1 5 7 8 1 7 3 2 1 5 8 3 2 1 1 7 3 5 8 2 Fig. 3. The center of activation for different constellations of input to the fully trained system in the A-SOM and in the two SOMs. The centers of activation for the generalization samples have been written on circles with a contrasting colour to differentiate them from the training samples. Upper row left: The A-SOM when only main input to the A-SOM is received. The Voronoi tessellation for these centers of activation has also been drawn. This is also true for the other images in this figure depicting activations in the A-SOM. Upper row middle: The SOM1 with the Voronoi tesselation for the training set drawn. Upper row right: The SOM2 with the Voronoi tesselation for the training set drawn. Middle row left: The A-SOM receiving main input and the activity of SOM1. Middle row middle: The A-SOM when receiving main input and the activity of SOM2. Middle row right: The A-SOM when receiving main input and the activities of SOM1 and SOM2. Lower row left: The A-SOM when receiving the activity of SOM1 only. Lower row middle: The A-SOM when receiving the activity of SOM2 only. Lower row right: The A-SOM receiving the activities of SOM1 and SOM2. 8 610 Self- Organising NewAlgorithm Achievements Self Organizing Maps ApplicationsMaps, and Novel Design border to the correct Voronoi cell (but this should probably not be considered an indication of incorrectness for the same reason as mentioned above), and 2 which is located close to the correct Voronoi cell. Rightmost of the upper row of Fig. 3 we can see that all centers of activation for the generalization samples are correctly located in SOM2 besides 2, which is located close to the correct Voronoi cell. Leftmost in the middle row of Fig. 3 we can see that the centers of activation for all the generalization samples besides sample 8 (which should probably not be considered an indication of incorrectness for the same reason as mentioned above) is within the correct Voronoi cell in the A-SOM when it receives main input as well as the activity of SOM1 as input. In the middle of the middle row of Fig. 3 we can see that the centers of activation for all the generalization samples besides sample 8 (which should probably not be considered an indication of incorrectness for the same reason as mentioned above) is within the correct Voronoi cell in the A-SOM when it receives main input as well as the activity of SOM2 as input. Rightmost of the middle row of Fig. 3 we can see that the centers of activation for all the generalization samples besides sample 8 (which should probably not be considered an indication of incorrectness for the same reason as mentioned above) is within the correct Voronoi cell in the A-SOM when it receives main input as well as the activities of both SOM1 and SOM2 as input. Leftmost of the lower row of Fig. 3 we can see that the centers of activation for all the generalization samples besides sample 2 and 10, i.e. 80%, is within the correct Voronoi cell in the A-SOM when it receives the activity of SOM1 as its only input. In the middle of the lower row of Fig. 3 we can see that the centers of activation for all the generalization samples besides sample 2, i.e. 90%, is within the correct Voronoi cell in the A-SOM when it receives the activity of SOM2 as its only input. Rightmost of the lower row of Fig. 3 we can see that the centers of activation for all the generalization samples besides sample 2 and 10, i.e. 80%, is within the correct Voronoi cell in the A-SOM when it receives the activities of SOM1 and SOM2 as its only input. In Fig. 4 we can see a graphical representation of the activity in the two SOMs as well as total, main and ancillary activities of the A-SOM while receiving a sample from the generalization set. The lighter an area is in this depiction, the higher the activity is in that area. 3.1.2 Discussion The ability of the A-SOM proved to be good in this experiment, with 100% accuracy with the training set and about 80-90% accuracy in the generalization tests, depending on which constellation of inputs which was provided to the system. It was also observed that the generalization in the ordinary SOMs was not perfect. If this had been perfect the generalization ability would probably be even better. This is probably a matter of optimizing the parameter settings. It is interesting to speculate, and later test, whether there are any restrictions on the sets that are used as input to the different SOMs and A-SOMs in this kind of system. A reasonable guess would be that to learn to associate the activity arising from the training sets impose no restrictions on the training sets, but when it comes to generalization there would probably be one restriction. The restriction is that there should probably need to exist a topological function between the different input spaces so that the sequences of input samples from the AssociativeSelf-Organizing Self-Organizing Map Associative Map 9 611 Fig. 4. Activations at a moment in the simulation. The lighter an area is in this depiction, the higher the activity is in that area. Upper row left: The activity in SOM1. Upper row right: The activity in SOM2. Lower row left: The total activity in the A-SOM. Lower row, the second image from the left: The main activity in the A-SOM. Lower row, the third image from the left: The ancillary activity in the A-SOM due to the activity in SOM1. Lower row right: The ancillary activity in the A-SOM due to the activity in SOM2. different input spaces will invoke traces of activities over time in their respective SOM or A-SOM that in principle would be possible to map on each other by using only translations, rotations, stretching and twisting. Otherwise the generalization would be mixed up at least partially. The same would be true if the parameter setting implies the development of fragmentized representations. 3.2 Associating SOM representations of haptic submodalities We have also tested the A-SOM together with a couple of real sensors (texture/hardness)(Johnsson & Balkenius, 2008). This system developed representations for texture as well as hardness and could trigger an activation pattern in the other modality, which resembled the pattern of activity the object would yield if explored with the sensor for this other modality. 3.2.1 Sensors in the experiment The system employs two sensors (Fig. 5) developed at Lund University Cognitive Science (LUCS). One of these sensors is a texture sensor and the other is a hardness sensor. The texture sensor consists of a capacitor microphone with a tiny metal edge mounted at the end of a moveable lever, which in turn is mounted on a servo. When exploring a material the lever is turned by the servo, which moves the microphone with the attached metal edge along a curved path in the horizontal plane. This makes the metal edge slide over the explored material, which creates vibrations in the metal edge with frequencies that depend on the textural properties of the material. The vibrations are transferred to the microphone since there is contact between it and the metal edge. The signals are then sampled and digitalized by a NiDaq 6008 (National Instruments) and conveyed to a computer via a USB-port. The Fast 10 612 Self- Organising NewAlgorithm Achievements Self Organizing Maps ApplicationsMaps, and Novel Design Fig. 5. The texture and hardness sensors while exploring a piece of foam rubber. The texture sensor consists of a capacitor microphone (a) with a metal edge (b) mounted at the end of a moveable lever (c), which in turn is mounted on a servo. The hardness sensor consists of a stick (d) mounted on a servo. The servo belonging to the hardness sensor contains a variable resistor that provides a measure of the turning of the servo, and thus the displacement of the stick, which is proportional to the compression of the explored material. The actuators are controlled via a SSC-32 controller board (Lynxmotion Inc.). The measure of the resistance of the variable resistor in the servo for the hardness sensor and the microphone signal of the texture sensor are digitalized using a NiDaq 6008 (National Instruments) and conveyed to the computer via a USB-port. Fourier Transform (FFT) algorithm is then applied to the input, thus yielding a spectrogram of component frequencies. The hardness sensor consists of a stick mounted on a servo. During the exploration of a material the servo tries to move to a certain position, which causes a downward movement of the connected stick at a constant pressure. In the control circuit inside the servo there is a variable resistor that provides the control circuit with information whether the servo has reached the wanted position or not. In our design, we measure the value of this variable resistor at the end of the exploration of the material and thus get a measure of the end position of the stick in the exploration. This end position is proportional to the compression of the explored material. The value of the variable resistor is conveyed to a computer and represented in binary form. The actuators for both the sensors are controlled from the computer via a SSC-32 controller board (Lynxmotion Inc.). The software for the system presented in this paper is developed in C++ and runs within the Ikaros system (Balkenius et al., 2010). Ikaros provides an infrastructure for computer simulations of the brain and for robot control. 3.2.2 Exploration of objects Our system was trained and tested with two sets of samples. One set consists of 40 samples of texture data and the other set consists of 40 samples of hardness data. These sets have been constructed by letting both the sensors simultaneously explore each of the eight objects described in Table 1 five times. During the hardness exploration of an object the tip of the hardness sensor stick (Fig. 5 d) is pressed against the object with a constant force and the displacement is measured. 11 613 AssociativeSelf-Organizing Self-Organizing Map Associative Map Label a b c d e f g h Object Foam Rubber Hardcover Book Bundle of Paper Cork Doily Wood Doily Bundle of Denim Bundle of Cotton Fabric Terry Cloth Fabric Estimated Hardness Soft Hard Hard Hard Hard Soft Soft Soft Estimated Texture Somewhat Fine Shiny Fine Rough Fine Somewhat Fine Somewhat Fine Rough Table 1. The eight objects used in the experiment. The objects a-h were used both for training and testing. The materials of the objects are presented and they are subjectively classified as either hard or soft by the authors. A rough subjective estimation of their textural properties is also provided. The exploration with the texture sensor is done by letting its lever (Fig. 5 c) turn 36 degrees during one second. During this movement the vibrations from the metal edge (Fig. 5 b) slid over the object are recorded by the microphone (Fig. 5 a) mounted at the end of the stick. The output from the texture sensor from all these explorations has then been written to a file after the application of the FFT. Likewise the output from the hardness sensor has been written to a file represented as binary numbers. The hardness samples can be considered to be binary vectors of length 18 whereas the texture samples can be considered to be vectors of length 2049. The eight objects have various kinds of texture and can be divided into two groups, one with four rather soft objects and one with four rather hard objects. During the exploration, the objects were fixed in the same location under the sensors. 3.2.3 Experiment Our system (Fig. 6) is bimodal and consists of two monomodal subsystems (hardness and texture), which develop monomodal representations (A-SOMs) that are associated with each other. The subsystem for hardness uses the raw sensor output from the hardness sensor, represented as a binary number with 18 bits and conveys it to an A-SOM with 15 × 15 neurons. After training, this A-SOM will represent the hardness property of the explored objects. In the subsystem for texture, the raw sensor output from the texture sensor is transformed by a FFT module into a spectrogram containing 2049 frequencies, and the spectrogram which is represented by a vector, is in turn conveyed to an A-SOM with 15 × 15 neurons. After training, this A-SOM will represent the textural properties of the explored objects. The two subsystems are coupled to each other in that their A-SOMs also receive their respective activities as ancillary input. Both A-SOMs begun their training with the neighbourhood radius equal to 15. The neighbourhood radius was decreased at each iteration by multiplication with 0.998 until it reached the minimum neighbourhood size 1. Both A-SOMs started out with α(0) = 0.1 and decreased it by multiplication with 0.9999. β where set to 0.35 for both A-SOMs. The system was trained with samples from the training set, described in the previous section, by 2000 iterations before evaluation. 3.2.4 Results and discussion The results of the experiment with the texture/hardness system are depicted in Fig. 7. The 6 images depict the centres of activation in the A-SOMs when the fully trained system was 12 614 Self- Organising NewAlgorithm Achievements Self Organizing Maps ApplicationsMaps, and Novel Design A-SOM Texture A-SOM Hardness FFT Texture Sensor Hardness Sensor Fig. 6. Schematic depiction over the architecture of the haptic hardness and texture perception system. The system consists of two monomodal subsystems, which develop monomodal representations (A-SOMs) of hardness and texture that learn to associate their activities. The hardness subsystem uses the raw sensor output from the hardness sensor as input to an A-SOM, which finds a representation of the hardness property of the explored objects. The texture subsystem transforms the raw sensory data by the aid of a FFT module and then forwards it to another A-SOM, which finds a representation of the textural properties of the explored objects. The two A-SOMs learn to associate their respective activities. tested with the test set (described above) constructed with the aid of the objects a-h in Table 1. Images 7 A, 7 B and 7 C correspond to the texture representing A-SOM. Likewise the images 7 D, 7 E and 7 F correspond to the hardness representing A-SOM. Each cell in an image represents a neuron in the A-SOM. In the images 7 A, 7 B, 7 D and 7 E there are black circles in some of the cells. This means that the corresponding neurons in the A-SOM are the centre of activation for one or several of the samples in the test set. The centres of activation from the samples in the test set corresponding to each object in Table 1 when only main input was provided have been encircled in 7 A and 7 D to show where different objects are mapped in the A-SOMs. Main input should be understood as texture input for the texture representing A-SOM, and hardness input for the hardness representing A-SOM. The encirclings are also present in the other four images. This is so because we want to show how the A-SOMs are activated when there are both main and ancillary input provided to the system (7 B and 7 E), and when there are only ancillary input provided (7 C and 7 F). Ancillary input should be understood as hardness input in the case of the texture representing A-SOM, and as texture input in the case of the hardness representing A-SOM. Fig. 7 A depicts the texture representing A-SOM in the fully trained system when tested with the test set (only main texture input). As can be seen, most objects are mapped at separate sites in the A-SOM (c, d, e, f, h). There are some exceptions though, namely a, b and g. So the system is able to discriminate between individual objects when provided with main input only, although not perfectly. The hardness representing A-SOM in the fully trained system when tested with the test set (only main hardness input), depicted in Fig. 7 D, also maps different objects at different sites in the A-SOM but not as good as the texture representing A-SOM. The hardness representing AssociativeSelf-Organizing Self-Organizing Map Associative Map 13 615 A-SOM recognizes b, f and h perfectly and blurs the other more or less. However, the hardness representing A-SOM perfectly discriminates hard from soft objects. When the texture representing A-SOM receives main texture input as well as ancillary hardness input (as can be seen in Fig. 7 B) its activations are very similar to those in Fig. 7 A. Likewise when the hardness representing A-SOM receives main hardness input as well as ancillary texture input (as can be seen in Fig. 7 E) its activations are very similar to those in Fig. 7 D. Fig. 7 C depicts the activations in the texture representing A-SOM when it receives only ancillary hardness input. As can be seen this ancillary hardness input very often triggers an activity similar to the activity following main texture input. Likewise Fig. 7 F depicts the activity in the hardness representing A-SOM when it receives only ancillary texture input. Even in this case the ancillary input very often triggers an activity similar to the activity following main input. This means that when just one modality in the system receives input, this can trigger activation in the other modality similar to the activation in that modality when receiving main input. Thus an object explored by both sensors during training of the system can trigger a more or less proper activation in the representations of both modalities even when it can be explored by just one sensor during testing. However, as can be seen in Fig. 7 C and Fig. 7 F, the activity triggered solely by ancillary input does not map every sample properly. The worst cases are the objects c, d and g in the texture representing A-SOM (Fig. 7 C) and the objects a, b and g in the hardness representing A-SOM (Fig. 7 D). As can be seen in Fig. 7 D, the objects c, d and g are not distinguishable in the hardness representing A-SOM, and the objects a, b and g are not distinguishable in the texture representing A-SOM (Fig. 7 A). Thus the ancillary activity patterns for these objects are overlapping and the receiving A-SOM cannot be expected to learn to map these patterns correctly even if the objects where well separated by the A-SOM when it received main input. 4. Modelling internal simulation In this section we will focus on the use of the A-SOM as a memory for perceptual sequences. Theses experiments was accomplished by using the total activity of the A-SOM as time-delayed ancillary input to itself. 4.1 A bimodal system We have set up a bimodal model consisting of two A-SOMs (Fig. 8) and tested its ability to continue with reasonable sequences of activity patterns in the two A-SOMs in the absence of any input. This could be seen as an ability to internally simulate expected sequences of perceptions within a modality likely to follow the last sensory experience while simultaneously elicit reasonable perceptual expectations in the other modality. One of the A-SOMs, the A-SOM A, is a recurrent A-SOM, and one, the A-SOM B, is an ordinary A-SOM without recurrent connections. A-SOM A is connected to A-SOM B (see Fig. 8). Thus the activity in A-SOM A will elicit associated activity in A-SOM B. To test the model a set of 10 training samples were constructed (Fig. 9 left). This was done in the same way as when testing the model of cross-modal expectations described in section 3.1 above. I.e. by randomly generating 10 points with a uniform distribution from a subset s of the plane s = {( x, y) ∈ R2 ; 0 ≤ x ≤ 1, 0 ≤ y ≤ 1} and map these points to a subset of R3 by adding a third constant element of 0.5. The A-SOM A receives its main input from the constructed input set described above. In addition, its total activity is fed back as ancillary input with a time delay of one iteration. 14 616 Self- Organising NewAlgorithm Achievements Self Organizing Maps ApplicationsMaps, and Novel Design A. B. f C. f d h d h h h c h a g g h h E. f g e a F. f h h g h h e a c c c b b gbb a b a fa g g e e d c e b a e g dc d d c b D. f ec f f h hg h g a f hf f f e h Soft Soft g Hard a Hard a d c Soft b g b b b g g a Hard a d c e e ab e c e c d d e e c c d c Fig. 7. The mapping of the objects used in the experiments. The characters a-h refer to the objects in Table 1. The images in the uppermost row correspond to the texture representing A-SOM and the images in the lowermost row correspond to the hardness representing A-SOM. Each cell in an image represents a neuron in the A-SOM, which consists of 15 × 15 = 225 neurons. A filled circle in a cell is supposed to mean that that particular neuron is the centre of activation for one or several explorations. The occurrence of a certain letter in the rightmost images means that there are one or several centres of activation for that particular object at that particular place. The centres of activation from the samples in the test set corresponding to each object in Tabel 1 when only main input was provided have been encircled in the images. A: The texture representing A-SOM when tested with main texture input. Most objects are mapped at separate sites so the system is able to discriminate between individual objects when provided with main input, although not perfectly. B: The texture representing A-SOM when tested with main texture input together with ancillary hardness input. Its activations are very similar to those in A. C: The texture representing A-SOM when it receives only ancillary hardness input. This often triggers an activity similar to the activity following main texture input. D: The hardness representing A-SOM when tested with main hardness input maps different objects at different sites and it perfectly discriminates hard from soft objects. E: The hardness representing A-SOM when tested with main hardness input together with ancillary texture input. Its activations are very similar to those in D. F. the hardness representing A-SOM when it receives only ancillary texture input. This often triggers an activity similar to the activity following main hardness input. Besides the main input from the constructed input set, the A-SOM B receives the total activity of the A-SOM A as ancillary input without any time delay. Both A-SOMs were simultaneously fed with the 10 samples of the training set over and over again, all the time in the same 15 617 AssociativeSelf-Organizing Self-Organizing Map Associative Map delay=1 delay=0 A-SOM A A-SOM B Input Input Fig. 8. Schematic depiction of the connections between the two A-SOMs and the recurrent connections of A-SOM A in the bimodal system. The bimodal system consist of two subsystems, which develop representations of sample sets from two input spaces (for simplicity we used the same input set for both representations). The A-SOM B learns to associate its activity with the activity of A-SOM A. This means proper activity can be elicited in the A-SOM B of the fully trained system even if it does not receive any main input. This is similar to cross-modal activation in humans, e.g. a tactile perception of an object invoking an internal visual imagination of the same object. One of the representations (A-SOM A) also learns to reproduce the sequence of activity patterns presented during training. Thus the sequence of activity patterns in A-SOM A elicits an appropriate sequence of activity patterns also in A-SOM B even when this lacks main input. sequence, during a training phase consisting of 2000 epochs (i.e. 20000 iterations). The two A-SOMs could as well have been fed by samples from two different sets, always receiving the same combinations of samples from the two sets (otherwise the system could not learn to associate them). This could be seen as a way of simulating simultaneous input from two different sensory modalities when an animal or a robot explores its environment. Each of the two A-SOMs consisted of 15 × 15 neurons. The softmax exponent for each of them were set to 1000. Their learning rate α(0) was initialized to 0.1 with a learning rate decay of 0.9999 (i.e. multiplication of the learning rate with 0.9999 in each iteration), which means the minimum learning rate, set to 0.01, will be reached at the end of the 20000 training iterations. The neighbourhood radius, i.e. σ of the neighbourhood function Gijc (t) in eq. (6), was initialized to 15 for both A-SOMs and shrunk to 1 during the 20000 training iterations by using a neighbourhood decay of 0.9998 (i.e. multiplication of the neighbourhood radius with 0.9998 in each iteration). Both A-SOMs used plane topology when calculating the neighbourhood. β for the associative weights in both A-SOMs was set to 0.35. After training, weight adaptation was turned of and the system was tested by feeding both A-SOM A and A-SOM B with the 10 samples from the training set once again in the same sequence as during the training phase, i.e. the system received input for one epoch. The centres of activity for each sample in both A-SOMs were recorded, and the corresponding Voronoi tesselations for the A-SOMs were calculated (Fig. 9 middle and right). The centres of activity, of course, always correspond to the localizations of the neurons in the A-SOMs. However, if we consider the centres of activity to be points in the plain, then we can calculate a Voronoi tesselation of the plane according to these points. In this way we will also get a division of the grid of neurons of each A-SOM. This is because each neuron in an A-SOM will be localized in a Voronoi cell or on the border between several Voronoi cells (when we see the localizations of the neurons as points in the plane). Voronoi tesselations for the activity centres of the A-SOMs are used to assess the performance 16 618 Self- Organising NewAlgorithm Achievements Self Organizing Maps ApplicationsMaps, and Novel Design 15 15 15 8 5 7 6 7 8 10 4 10 6 1 3 Y 3 1 2 9 2 Y 5 10 10 10 6 4 Y 9 4 9 5 5 8 1 10 3 5 0 0 0 5 X 10 15 0 0 2 7 5 X 10 15 5 0 5 X 10 15 Fig. 9. Left: The Voronoi tessellation of the points used when constructing the training set used as input to the two A-SOMs in the bimodal system. Middle and Right: The Voronoi tessellations corresponding to the centres of activity during the first epoch of the test phase for the two A-SOMs. The image in the middle depicts the Voronoi tesselation of the fully trained A-SOM A together with the 10 centres of activity corresponding to the 10 first iterations of the test phase when the system received input from the sample set. The right image depicts the same but for the fully trained A-SOM B. of the system. This is done in the following way: During the first epoch after training when the A-SOMs received main input, we recorded the sequences of Voronoi cells containing the centres of activity for the sequences of activity patterns in both A-SOMs. After the first epoch the A-SOMs did not receive main input anymore, i.e. only null vectors were received as main inputs. Anyway, sequences of activity patterns continued to be elicited in both A-SOMs. This means the system continued to run with internal states only. This is possible since A-SOM A received its own total activity as ancillary input with a time delay of one iteration and the A-SOM B received the total activity of A-SOM A as ancillary input without any time delay. For each of the following 25 epochs (without any main input to the A-SOMs) we recorded whether the centres of activity for each iteration in the epoch was in the correct Voronoi cell. If the centre of activity is in the correct Voronoi cell, then it is considered correct because then it is sufficiently similar to the centre of activity (from the first test epoch) that corresponds to that Voronoi cell. This is because then it is closer to the centre of activity (from the first test epoch) that corresponds to that Voronoi cell than to any other centre of activity from the first test epoch. This procedure enabled us to calculate the percentage of correct activity patterns for each of the 25 epochs without main input to the A-SOMs during the test phase. During these 25 epochs the activity is elicited solely by recurrent connections. In Fig. 9 (middle and right) we can see that both A-SOMs perfectly discriminate between the 10 samples in the sample set, and by comparing the Voronoi tessellations of the A-SOMs (Fig. 9 middle and right) with the Voronoi tessellation of the plane for the training set (Fig. 9 left) we can see that the ordering of the Voronoi cells for the training set are to a large extent preserved for the Voronoi cells for the centres of activation in the A-SOMs. Fig. 10 shows the percentages of correct activity patterns in each epoch (i.e. each sequence of 10 iterations) for each of the first 25 epochs when the system did not receive anymore main input. The diagram to the left in Fig. 10 depicts the result for A-SOM A, whereas the diagram to the right in Fig. 10 depicts the result for A-SOM B. As can be seen the percentage of correct AssociativeSelf-Organizing Self-Organizing Map Associative Map 17 619 Fig. 10. The percentages of correct activity patterns in each epoch (i.e. sequence of 10 iterations) for each of the first 25 epochs when the system did not receive anymore main input. The diagram to the left depicts this for A-SOM A, whereas the diagram to the right depicts it for A-SOM B. activity patterns is 100% for the first 9 epochs without main input in both A-SOM A and A-SOM B. The percentage of correct activity patterns then decline gradually in both A-SOMs and at the 25th iteration it is 60% for A-SOM A and 20% for A-SOM B. In Fig. 11 we can see a graphical representation of the total, main and ancillary activities of the two A-SOMs when these receive input from the sample set as well as when they do not. The lighter an area is in this figure, the higher the activity is in that area. To summarize, our model has shown ability to continue to produce proper sequences of activity in both A-SOMs for several epochs even when these have stopped receiving any main input. These results confirms the models ability of internal simulation as well as of cross-modal activation. 4.2 More experiments with recurrently connected A-SOMs We have done three more experiments to investigate the properties, capabilities and limitations of the A-SOM when its activity is connected recurrently to itself as ancillary input with a delay of one iteration. These experiments were inspired by Elman (1990) where a recurrent network is trained with sequences starting with a consonant and followed by a variable number of vowels. In his example he used the three sequences ’ba’, ’dii’ and ’guuu’. Each of these six letters (b, a, d, i, g and u) is coded as a vector of six binary digits. Elman (1990) motivates each of the positions of the vector as a feature of the letter such as consonant/vowel, if it is interrupted, hard, articulated in the back of the mouth and if it is voiced. These different features are, however, irrelevant for the functioning of the neural network. As long as the vectors for each letter is distinct from each other the network will learn them. In all three experiments the neighbourhood radius was initialized to the same size as the network, i.e. in a n × n A-SOM the neighbourhood radius was initialized to n. The neighbourhood radius was multiplied by 0.9998 in each iteration, thus decaying exponentially. The learning rate was initialized to 0.1 and also decayed but by a factor of 0.9999. Minimum values for the neighbourhood radius and the learning rate was set to 1 and 0.01, respectively. 18 620 Self- Organising NewAlgorithm Achievements Self Organizing Maps ApplicationsMaps, and Novel Design Epoch 0, first activation Total Activity Total Activity Main Activity Main Activity Ancillary activity in A-SOM A elicited by previous total activity A-SOM A Ancillary activity in A-SOM B elicited by A-SOM A A-SOM B A-SOM A A-SOM B Input Input Epoch 1, first activation Total Activity Main Activity Ancillary Activity in A-SOM A elicited by previous total activity A-SOM A A-SOM A Epoch 9, first activation Total Activity Total Activity Total Activity Main Activity Main Activity Main Activity Ancillary Activity in A-SOM B elicited by A-SOM A A-SOM B A-SOM B Ancillary Activity in A-SOM A elicited by previous total activity Ancillary Activity in A-SOM B elicited by A-SOM A A-SOM A A-SOM B A-SOM A A-SOM B Fig. 11. Activations at three different occasions in the simulation with the bimodal system. The lighter an area is in the figure, the higher the activity is in that area. Column 1 (from the left): The three kinds of activity in A-SOM A when receiving the first sample of the sequence in the test phase. Column 2: The activities in A-SOM B when receiving the first sample of the sequence in the test phase. Column 3: The activities in A-SOM A when simulating the activity corresponding to sample 1 in the first epoch without input. Notice that there are no main activity. Column 4: The activities in A-SOM B elicited by the associative connections from A-SOM A due to the latter A-SOMs activity corresponding to the simulated activity of sample 1 in the first epoch without input. Column 5: The activities in A-SOM A when simulating the activity corresponding to sample 1 in the ninth epoch without input. This is the last epoch with perfect recall in the first cycle (see Fig. 10). Column 6: The activities in A-SOM B elicited by the associative connections from A-SOM A due to the latter A-SOMs activity corresponding to the simulated activity of sample 1 in the ninth epoch without input. In all experiments the training phase lasted for 20 000 iterations. In our experiments we created the training data in the same way as Elman (1990) i.e by repeating the sequences above in random order. This results in a semi-random sequence, i.e the consonants occur randomly but vowels always follow the consonants in a consistent manner (Elman, 1990). Structuring the input data in this semi-random way has an obvious advantage in that the network can be taught several sequences of input data. By having several sequences, the network is forced to be more versatile and generalized. The test data was produced in the same way as described above, but only leaving the first letter of every 19 621 AssociativeSelf-Organizing Self-Organizing Map Associative Map I 100% 100% I U3 A 100% 80% U2 100% 100% 100% U A 100% D 50% 3% 50% 20x20 41% 15x15 50% 99% 90% G B U1 100% D 100% 100% 100% 100% 100% 100% 86% 10x10 I 100% D G B A 50% B 88% 25x25 I U 50% G 11% D A 100% B U 100% G Fig. 12. Left: An example of an ideal state diagram for three sequences of length 5. Right: State diagrams for networks with 10 × 10, 15 × 15, 20 × 20 and 25 × 25 nodes. sequence untouched and changing the others to null vectors (thus simulating no input at all). For every test iteration the coordinates of the winner neuron on the A-SOM surface were recorded. Then we analysed the sequential order of the winners by generating state diagrams. A state diagram can visualize a systems behaviour by specifying a number of states that the system may be in, as well as possible transitions between these states. A state diagram is generally drawn as a number of boxes, representing states, and arrows, representing the transitions. We label transitions with a percentage, indicating the probability for that transition compared to all transitions from the same source state. This means that our state diagrams are graphical representations of non-deterministic finite state machines, or what is also referred to as Markov chains. An example of an ideal state diagram for three sequences of length 5 is shown to the left in Fig. 12. There will of course also be transitions from the topmost states to the bottom states (from the last letter of each sequence to the first letter of each following sequence), with, on average, a transition probability of 33%. However, these have been left out since they only indicate the start of a new sequence. 4.2.1 Different lengths, same vowels The first experiment was a replication of one of Elman (1990) experiment where the network was trained on sequences representing ’ba’, ’dii’ and ’guuu’. That is, three sequences starting with a unique consonant followed by one, two or three of the same vowel. For these experiments, validation was made to find out which winner neuron corresponded to which letter in the sequences. This was done by simply giving the letters as input and registering which neuron was the winner for each letter. The A-SOM was then tested with test data constructed in the same way as the training data but with the vowels substituted with null vectors, as described above. The state diagrams for this experiment is shown to the right in Fig. 12 for four different sizes of the A-SOM; 10 × 10, 15 × 15, 20 × 20 and 25 × 25. To make the diagrams more readable, only the correct connections have been included. As an example of the activity in this experiment, Fig. 13 (left) shows the activity of the network one iteration 20 622 Self- Organising NewAlgorithm Achievements Self Organizing Maps ApplicationsMaps, and Novel Design Fig. 13. Left: Activity pattern one iteration after presenting a ’B’ in the first experiment with same vowels, different lengths. Right: Activity patterns for sequences in the experiment with same length and same vowels. after presenting it with the letter B. The validated areas are shown with arrows. The letter ’u’ is represented in two places (they both are activated when validating), as is ’i’, but ’i’ elicit maximum activity in different areas based on its position in the sequence, as indicated by i1 and i2 . In Fig. 13 (left) we can also see that there is a black spot in the lower left corner where there is no activity at all. One interpretation of this is that the network has learnt that a ’B’ never follows an ’a’. One can also see that there is a little bit of activity in the rest of the network, even in parts that do not seem to represent any letter at all. The reason for this could be that for the lower left ’B’ to be entirely void of activity, there need to be activity elsewhere to make that part have relatively lower activity, i.e. activity must be relative to all other activity. 4.2.2 Same length, same vowels Having run Elman’s original test, the experiment was modified so that the sequences were of equal length. They were all set to be three letters long; one consonant and two equal letters (’baa’, ’dii’ and ’guu’). The analysis of this experiment was different. Pictures were taken of the total activity of the network and assembled into a composite image so that the sequences of activity could be visualized easily. This was done to make a qualitative analysis of interesting ways in which the network represent different relations between states. Fig. 13 (right) shows the activity of the network in the test of this experiment. In Fig 13a, the activity patterns of the network are grouped by sequence, to enable comparisons within sequences, whereas Fig 13b enables comparisons between sequences. One can see that activity patterns are distinct from each other for different positions even though the letter is the same (horizontally compare the first ’a’ and second ’a’ following a ’B’ in Fig 13a.), though very similar for the same position between different trials (vertically compare the first ’a’s in Fig 13a). 4.2.3 Same length, unique vowels In the third experiment all sequences had the same length and all vowels were different, even within sequences. Note that we use the word vowel here to mean an element in the sequence, there is no connection to alphabetical vowels. The length of the sequences ranged from 2 to 19 elements. The aim here was to find the smallest network that could represent sequences of 21 623 AssociativeSelf-Organizing Self-Organizing Map Associative Map 2500 y=18,268e0,214x R2=0,975 2000 1500 1000 500 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 Fig. 14. Graph plotted for the number of nodes in smallest network with regard to the total number of letters in the sequences. each length with 100% correctness or very close to it. To speed up this process we made the assumption that the minimum network size would not decrease when increasing the sequence length. That is, for a new sequence length the initial network size tested was the network size of the previously run test. It has turned out that this assumption does not hold strictly. In one training trial it was discovered that a network of 9 × 9 nodes was able to represent the sequences, while a network of 10 × 10 nodes performed worse. So even though the required network size seemed to increase with increased sequence length overall, there are minor local variations to this rule. This may simply be the effect of random variations in training data or the initial connection weights. No validation was made for whether the sequence of winners from the testing was in the correct order. Only the pattern of states and their transitions percentages were used and it was manually tested whether the state diagram fitted with an ideal diagram, as shown in Fig. 12 (left). This should not be a problem since it would be extremely unlikely that an incorrect state transition would have a probability of 100%. When running the trial where the sequence length was nine, no network seemed to be able to represent the sequences fully. Sizes up to 25 × 25 nodes were tried without success. It is worth mentioning that sequences of length eight only required 9 × 9 nodes. But, regenerating the test and training data and running the trial again relieved the matter and a 9 × 9 network was found that performed 100%. This could indicate that the network is sensitive to the training and test data, but to be certain further research should be done on the difference between these two training/test-sets. Running the experiment with every trial having three sequences of the same length and all unique letters, and then plotting the smallest network size that could represent the sequences, produced the graph seen in Fig. 14. The graph shows that the number of nodes in the smallest network is exponential with regard to the number of total letters of the sequences in the trial. 4.2.4 Discussion A simplification that has been made in these experiments to make analysis faster as well as more straightforward, has been to use only the sequence of winner neurons. Other activity of the network has thus been ignored. As one can see in Fig. 13 (left), the top area of the network (the ’a’) is the winner here, but there is still much activity in other parts of the network. This can also be seen in the activity sequence series, Fig. 13 (right), where the same letter in different positions of the sequence exhibit different activity patterns. These same letter patterns have 22 624 Self- Organising NewAlgorithm Achievements Self Organizing Maps ApplicationsMaps, and Novel Design distinct winners. This means that it is not completely satisfying to only record the winner neurons. What one would want, rather, is to use the entire activity pattern instead of only the winner neuron. This would require some method to classify similar activity patterns, while separating not too similar patterns. Incidentally, this is a very suitable task for a regular SOM and we could thus use the activity of the A-SOM as input to a separate SOM, an analysis SOM, that would classify the activity of the A-SOM. Then the winner neurons of the analysis SOM, rather than the A-SOM, could be used to determine whether the sequences had been learnt. 5. Conclusion We have presented a novel variant of the Self-Organizing Map called the Associative Self-Organizing Map (A-SOM), which develops a representation of its input space but also learns to associate its activity with the activities of an arbitrary number of (possibly time delayed) ancillary inputs. The A-SOM has been explored in several experiments. In one experiment we connected an A-SOM to two ancillary SOMs and tested with randomly generated points from a subset of the plane. The system in this experiment could be seen as a model of a neural system with two monomodal representations (the two SOMs) and one multimodal representation (the A-SOM) constituting a neural area that merges three sensory modalities into one representation. In another experiment we used the A-SOM in a bimodal self-organizing system for object recognition which used real sensors for the haptic submodalities hardness and texture. The results from this experiment are encouraging. The system turned out to be able to discriminate individual objects based on input from each submodality as well as to discriminate hard from soft objects. More importantly, the input to one submodality has shown to be sufficient to trigger an activation pattern in the other submodality, which resembles the pattern of activity the object would yield if explored with the sensor for this other submodality. In other experiments we explored the ability of the A-SOM to learn sequences and we presented an A-SOM based bimodal model of internal simulation, and tested its ability to continue with reasonable sequences of activity patterns in its two A-SOMs in the absence of any input. It is worth noting that although so far it has not been tested the authors can see no impediments to why it should not be possible to have several sets of connections that feed back the total activity of the A-SOM to itself as ancillary input but with varying lengths of the time delays. This would probably yield an enhanced ability for internal simulation and to remember perceptual sequences (at the cost of more computations). Among other unsupervised recurrent architectures the Recursive SOM (Voegtlin, 2002) is probably the most similar to an A-SOM with recurrent connections. It is worth commenting on some similarities and differences. The two architectures mainly differ in the way a winner neuron is selected. The selection of a winner in the Recursive SOM depends on both the input vector and the time delayed feedback activity. This is not the case for the A-SOM, where the winner selection depends only on the input vector. Because of this, a reasonable guess would be that the A-SOM with recurrent connections would perform better than the Recursive SOM in classification of single inputs when not considering where in the sequence it comes. This is so because the organization of the A-SOM is completely independent of the recurrent input. The recurrent connections in the A-SOM are ancillary connections, which means there is a separate set of weights that during learning are adjusted to produce ancillary activity that is similar to the main activity. There might of course also be some disadvantages with the AssociativeSelf-Organizing Self-Organizing Map Associative Map 23 625 A-SOM with recurrent connections when compared to the Recursive SOM. This would need further investigation. The A-SOM actually develops several representations, namely one representation for its main input (the main activity) and one representation for each of the ancillary neural networks it is connected to (the ancillary activities), and one representation which merges these individual representations (the total activity). One could speculate whether something similar could be found in cortex, perhaps these different representations could correspond to different cortical layers. 6. Acknowledgements This work was supported by the Swedish Linnaeus project Cognition, Communication and Learning (CCL), funded by the Swedish Research Council. 7. References Balkenius, C., Moren, J., Johansson, B. & Johnsson, M. (2010). Ikaros: Building cognitive models for robots, Advanced Engineering Informatics 24(1): 40–48. Bartolomeo, P. (2002). The relationship between visual perception and visual mental imagery: a reappraisal of the neuropsychological evidence, Cortex 38: 357–378. Bishop, C. M. (1995). Neural Networks for Pattern Recognition, Oxford University Press. Carpenter, G., Grossberg, S., Markuzon, N., Reynolds, J. & Rosen, D. (1992). Fuzzy ARTMAP: A neural network architecture for incremental supervised learning of analog multidimensional maps, IEEE Transactions on Neural Networks 3: 698–713. Chappell, G. J. & Taylor, J. G. (1993). The temporal kohonen map, Neural Networks 6: 441–445. Consortium, I. H. G. S. (2004). Finishing the euchromatic sequence of the human genome, Nature 431(7011): 931–945. Elman, J. (1990). Finding structure in time, Cognitive Science 14(2): 179–211. Hesslow, G. (2002). Conscious thought as simulation of behaviour and perception, Trends Cogn Sci 6: 242–247. Johnsson, M. & Balkenius, C. (2008). Associating SOM representations of haptic submodalities, in S. Ramamoorthy & G. M. Hayes (eds), Towards Autonomous Robotic Systems 2008, pp. 124–129. Johnsson, M., Balkenius, C. & Hesslow, G. (2009a). Associative self-organizing map, International Joint Conference on Computational Intelligence (IJCCI) 2009, pp. 363–370. Johnsson, M., Balkenius, C. & Hesslow, G. (2009b). Neural network architecture for crossmodal activation and perceptual sequences, Papers from the AAAI Fall Symposium (Biologically Inspired Cognitive Architectures) 2009, pp. 85–86. Kohonen, T. (1988). Self-Organization and Associative Memory, Springer Verlag. Kohonen, T. (1990). The self-organizing map, Proceedings of the IEEE, 78, 9, pp. 1464–1480. Kosslyn, S., Ganis, G. & Thompson, W. L. (2001). Neural foundations of imagery, Nature Rev Neurosci 2: 635–642. McGurk, H. & MacDonald, J. (1976). Hearing lips and seeing voices, Nature 264: 746–748. Miikkulainen, R., Bednar, J. A., Choe, Y. & Sirosh, J. (2005). Computational maps in the visual cortex, Springer. Mountcastle, V. (1997). The columnar organization of the neocortex, Brain 120(4): 701–722. Nguyen, L. D., Woon, K. Y. & Tan, A. H. (2008). A self-organizing neural model for multimedia information fusion, International Conference on Information Fusion 2008, pp. 1738–1744. 24 626 Self- Organising NewAlgorithm Achievements Self Organizing Maps ApplicationsMaps, and Novel Design Strickert, M. & Hammer, B. (2005). Merge som for temporal data, Neurocomputing 64: 39–71. Tan, A. H. (1995). Adaptive resonance associative map, Neural Networks 8: 437–446. Varsta, M., Millan, J. & Heikkonen, J. (1997). A recurrent self-organizing map for temporal sequence processing, ICANN 1997. Voegtlin, T. (2002). Recursive self-organizing maps, Neural Networks 15: 979–991.

Log In

Associative Self-organizing Map