# Self Organizing Map

The Self Organizing Map (SOM, Kohonen map) can be considered as a tool for dimension reduction. It was invented by Teuvo Kohonen.

High-dimensional input vectors are mapped to a (typically) 2D map such that input vectors that are 'near' in the high-dimensional input space are mapped to 'near' positions in the 2D map. In this sense the mapping is neighborhood-preserving.

That is a nice property since if we move slowly in this low-dimensional latent space, we also move - more or less - continously in the high-dimensional space.

To achieve this, two mechanisms are combined:

1. vector quantization to prototype nodes
2. simultaneous adaptation of neighbored prototype nodes to the current input vector

The following visualizations should help to understand the coaction of these two mechanisms better.

There is also a short intro to SOMs written by Teuvo Kohonen.

## Vector quantization only

• blue circles: sample 2D input vectors from a 2D swiss roll
• red circles: current prototype vector represented by a prototype SOM node
• yellow lines between 2 nodes: these nodes are neighbored in the 2D map

If we just initialize 10×10=100 protoype nodes with zero vectors and adapt only the best matching unit (BMU) a little bit into the direction of the current training vector, we get that:

Only 2 of 100 nodes are used to represent the input data.

If we initialize the 100 nodes with random vectors around the mean input vector, some nodes more are used - but still only a small part of all nodes that we could use to represent the input data, i.e. we do not use the full representative power of all SOM nodes.

```0.000 0.000 0.000 0.000 0.000
0.000 0.000 0.000 0.000 0.000
0.000 0.000 1.000 0.000 0.000
0.000 0.000 0.000 0.000 0.000
0.000 0.000 0.000 0.000 0.000```

Note, that Teuvo Kohonen stressed the point that a random initialization of the SOM can be regarded as some 'historical ballast':

Due to the many stages in the development of the SOM method and its variations, there is often useless historical ballast in the computations.

For instance, one old ineffective principle is random initialization of the model vectors tex2html_wrap_inline117. Random initialization was originally used to show that there exists a strong self-organizing tendency in the SOM, so that the order can even emerge when starting from a completely unordered state, but this need not be demonstrated every time. On the contrary, if the initial values for the model vectors are selected as a regular array of vectorial values that lie on the subspace spanned by the eigenvectors corresponding to the two largest principal components of input data, computation of the SOM can be made orders of magnitude faster, since (i) the SOM is then already approximately organized in the beginning, (ii) one can start with a narrower neighborhood function and smaller learning-rate factor.

Teuvo Kohonen

## Vector quantization with adaptation of neighbored prototype nodes

Something cool happens if we do not only adapt the BMU to the current training input vector but also its 8 neighbors (assuming a 2D topology):

The nodes now distribute onto the places in 2D input space where we get the input from.

```0.000 0.000 0.000 0.000 0.000
0.000 0.500 0.500 0.500 0.000
0.000 0.500 1.000 0.500 0.000
0.000 0.500 0.500 0.500 0.000
0.000 0.000 0.000 0.000 0.000```

## SOM with inhibition

In theory often a neighborhood adaptation function is proposed that also incorporates some small inhibition. E.g. the mexican hat function.

But if one uses too big inhibition the SOM can 'explode':

``` 0.000 -0.200 -0.200 -0.200  0.000
-0.200  0.500  0.500  0.500 -0.200
-0.200  0.500  1.000  0.500 -0.200
-0.200  0.500  0.500  0.500 -0.200
0.000 -0.200 -0.200 -0.200  0.000```

If you use instead a small inhibition, the SOM does not explode, but we get nearly the same result as with no inhibition. For this, in practice most people use a non-inhibitory adaptation function as e.g. a 2D Gaussian.

``` 0.000 -0.010 -0.010 -0.010  0.000
-0.010  0.500  0.500  0.500 -0.010
-0.010  0.500  1.000  0.500 -0.010
-0.010  0.500  0.500  0.500 -0.010
0.000 -0.010 -0.010 -0.010  0.000 ```

## SOM and isolated input data clusters

A drawback of the SOM is its fixed topology. Using a fixed topology means that each node has some neighbors (defined by the topology) that are adapted always - even if the neighbor represents a very different prototype vector.

One can see this effect when the input data stems not only from a coherent region in high-dimensional space, but from different isolated regions:

Two isolated regions: Swiss roll + one rectangle

Three isolated regions: Swiss roll + two rectangles

Some nodes are attracted by samples from the Swiss roll and one of the two rectangles at the same time such that they are pulled to a prototype vector somewhere between both regions - but actually no input data stems from that region! In this sense, the representation becomes errorneously!

``` 0.000 -0.010 -0.010 -0.010  0.000