如何实现kohonens自组织地图

科技2022-08-01 90

ANN | SOM | SOFM | 的MATLAB (ANN | SOM | SOFM | MATLAB)

Artificial neural networks (ANN) are inspired by the early models of sensory processing by the brain. An artificial neural network can be created by simulating a network of model neurons in a computer. By applying algorithms that mimic the processes of real neurons, we can make the network ‘learn’ to solve many types of problems. — Anders Krogh¹ (Nature Biotechnology)

人工神经网络(ANN)受大脑感觉处理的早期模型的启发。可以通过在计算机中模拟模型神经元网络来创建人工神经网络。通过应用模仿真实神经元过程的算法，我们可以使网络“学习”以解决许多类型的问题。 — AndersKrogh¹( 自然生物技术 )

In postmodern life, we engage with an enormous number of astonishing artificial neural network applications in each and every activity and we have no clue of their capacities and complexities. Artificial neural networks have been utilized to difficulties ranging from speech recognition to prediction of protein secondary structure, classification of cancers, and gene prediction¹. Since the awareness of these advanced performances will be a necessity in the near future, we should have a better knowledge of these and we can start our journey of understanding ANNs from a simple level.

在后现代生活中，我们在每项活动中都使用大量惊人的人工神经网络应用程序，而我们对其功能和复杂性一无所知。从语音识别到蛋白质二级结构预测，癌症分类和基因预测¹，人工神经网络已被广泛使用。由于在不久的将来有必要意识到这些先进的性能，因此我们应该对此有更好的了解，并且可以从简单的角度开始理解ANN的旅程。

As a basic type of ANNs, let’s consider a self-organizing map (SOM) or self-organizing feature map (SOFM) that is trained using unsupervised learning to produce a low-dimensional, discretized representation of the input space of the training samples, called a map.

作为ANN的基本类型，让我们考虑使用无监督学习进行训练的自组织图(SOM)或自组织特征图(SOFM) ，以生成训练样本输入空间的低维离散化表示，叫做地图。

Self Organizing Map?

自组织图？

It converts the nonlinear statistical relationships between high-dimensional data into simple geometric relationships of their image points on a low-dimensional display, usually a regular two-dimensional grid of nodes. As the SOM thereby compresses information while preserving the most important topological and/or metric relationships of the primary data elements on the display — Teuvo Kohonen²

它将高维数据之间的非线性统计关系转换为低维显示器(通常是规则的二维节点网格)上其图像点的简单几何关系。由于SOM在压缩信息的同时，还保留了显示屏上主要数据元素的最重要的拓扑和/或度量关系-Teuvo Kohonen²

🔎为什么选择SOM？ (🔎 Why SOM?)

Basically, SOMs are characterized as a nonlinear, ordered, smooth mapping of high-dimensional input data manifolds onto the elements of a regular, low-dimensional array². After training the SOM’s neurons we get a low dimensional representation of high dimensional input data without disturbing the shape of the data distribution and relationship between each input data element. Self-organizing maps differ from other ANNs as they apply unsupervised learning as compared to error-correction learning (backpropagation with gradient descent etc), and in the sense that they use a neighborhood function to preserve the topological properties of the input space. Since its simplicity, we can easily explain and demonstrate its capabilities. For a detailed explanation please refer to Self-Organizing Maps by Teuvo Kohonen².

基本上，SOM的特征是将高维输入数据流形非线性，有序，平滑地映射到规则的低维数组²上。在训练了SOM的神经元之后，我们得到了高维输入数据的低维表示，而不会干扰数据分布的形状以及每个输入数据元素之间的关系。自组织图与其他ANN的不同之处在于，与错误校正学习(带有梯度下降的反向传播等)相比，它们应用无监督学习，并且从某种意义上说，它们使用邻域函数来保留输入空间的拓扑属性。由于其简单性，我们可以轻松地解释和演示其功能。有关详细说明，请参阅TeuvoKohonen²撰写的《自组织地图》。

OM SOM如何工作？ (💡 How SOM works?)

SOM learning data representation — Inputs are given in blue dot and model’s neuron values are given in red dots for epoch 1, 10, 50 and 100 (Image by Author) SOM学习数据表示—时期1、10、50和100的输入以蓝色圆点表示，模型的神经元值以红色圆点表示(图片由作者提供)

A simple illustration of the learning process is given in the above figure and we can understand the specialty of SOMs from this representation easily. Initially, input data(blue dots) occupy a special distribution in 2D space, and un-learned neuron(weights) values (red dots) are randomly distributed in a small area and after neurons get modified and learned by inputs, it gets the shape of the input data distribution step by step in the learning process. In addition, each neuron became a representation of one small cluster of input data space. Therefore in this demonstration, we were able to represent 1000 data points with 100 neurons, preserving the topology of the input data. That means we have built a relationship between high-dimension data to low-dimensional representation (map). For further calculations and predictions, we can utilize these few neuron values to represent the tremendous input data space which makes processes much faster.

上图中给出了学习过程的简单说明，我们可以通过这种表示轻松地了解SOM的特长。最初，输入数据( 蓝点 )在2D空间中占据特殊的分布，未经学习的神经元(权重)值( 红点 )随机地分布在一个小区域中，经过输入对神经元进行修改和学习后，它得到了形状学习过程中逐步输入数据分布。此外，每个神经元都代表一个小的输入数据空间簇。因此，在本演示中，我们能够用100个神经元表示1000个数据点，并保留了输入数据的拓扑。这意味着我们已经建立了高维数据与低维表示(地图)之间的关系。为了进行进一步的计算和预测，我们可以利用这几个神经元值来表示巨大的输入数据空间，从而使处理过程更快。

📄学习算法 (📄 Learning Algorithm)

As a basic model of SOM, we are mapping from the ‘n’-dimensional input data space to a two-dimensional array of neurons (‘N’ number of neurons). This SOM can be implemented using the following procedure:

作为SOM的基本模型，我们正在从“ n”维输入数据空间映射到神经元的二维数组(神经元的“ N”个) 。可以使用以下过程来实现此SOM：

🛠设置： (🛠 Setup:)

“P” number of input vectors are available. ( i= 1, 2, …,P)

输入向量的“ P”个可用。 (i = 1，2，…，P)

ith input vector has n elements: Xᵢ = (xᵢ1, xᵢ2, …, xᵢn)

第i个输入向量具有n个元素： Xᵢ=(xᵢ1，xᵢ2，…，xᵢn)

“N” number of neurons (nodes or weights) are available. ( i= 1, 2, …, N)

可以使用“ N”个神经元(节点或权重)。 (i = 1，2，…，N)

ith neuron vector has n elements: mᵢ = (mᵢ1, mᵢ2, …, mᵢN)

第i个神经元向量具有n个元素： mᵢ=(mᵢ1，mᵢ2，…，mᵢN)

These neuron vectors are arranged in a 2D matrix for representation.

这些神经元矢量排列在2D矩阵中进行表示。 Assume all vector elements are real numbers.

假设所有向量元素都是实数。 Setup of SOM basic model — Input vectors and neuron matrix (Image by Author) SOM基本模型的设置-输入向量和神经元矩阵(作者提供的图像)

🔖活动动态： (🔖 Activity dynamics:)

For a given input Xᵢ, find the closest (smallest euclidean distance) neuron to the given input and signified the neuron by the c.

对于给定的输入Xᵢ ，找到与给定输入最接近的(最小欧氏距离)神经元，并用c表示神经元。

Learning️学习动态 (✏️ Learning dynamics:)

For a given input Xᵢ, after finding mc neuron, only update the neighborhood neuron set of mc:

对于给定的输入Xᵢ，发现MC神经元后，只更新了附近神经元组MC的：

for t = 0, 1, 2, … T. ( T is the number of iterations the models will get updated and mᵢ (0) can be an initial arbitrary vector). The function hci(t) is the so-called neighborhood function, a smoothing kernel defined over the lattice points (matrix element).

对于t = 0，1，2，…T 。 (T是模型将更新的迭代次数，并且mᵢ(0)可以是初始任意向量)。函数hci(t)是所谓的邻域函数，它是在晶格点(矩阵元素)上定义的平滑核。

Since we only need to update neighboring neurons around the mc neuron, first we need to find a neighborhood set of matrix points around neuron mc. A simple topological neighborhood finding method is given below and more advanced smooth neighborhood finding methods can be found in the literature².

因为我们只需要更新周围的神经元MC邻近神经元，首先我们需要找到解决神经元MC 附近一套矩阵点。下面给出了一种简单的拓扑邻域查找方法，而更高级的平滑邻域查找方法可以在文献中找到²。

Nc(t) — Green color neuron is Nc(t)中查找阵列点的邻域集-绿色神经元是 mc neuron and blue color neurons are the neighborhood set of neurons which will be only updated (Image by Author) mc神经元，蓝色神经元是仅会更新的神经元邻域集(图片由作者提供)

All the neurons inside the circle marked by Nc(t), we consider them as neighborhood set of ‘mc’ neurons. The neighborhood set of neurons’ radius Nc(t) is usually decreasing monotonically in iterations(t). We usually start with Nc(0) = √N/2 and we need to reduce the radius with each iteration.

圆内所有以Nc(t)标记的神经元，我们将它们视为“ mc”神经元的邻域集。神经元半径Nc(t)的邻域集通常在迭代(t)中单调递减。我们通常从Nc(0)=√N/ 2开始，并且每次迭代都需要减小半径。

After finding the set of neighborhood neurons that need to be updated we can use the following hci(t) function and update neurons around mc.

找到需要更新的邻居神经元集后，我们可以使用以下hci(t)函数并在mc周围更新神经元。

In this equation, || rc — ri|| define the distance between neuron’s 2D matrix positions (√N x √N matrix). The value of α(t) is identified as learning-rate factor (0 < α(t) < 1). Both α(t) and σ(t) are monotonically decreasing functions of time-varying as following:

在此等式中， || rc — ri || 定义神经元的二维矩阵位置之间的距离( √Nx√N矩阵)。 α(t)的值被标识为学习率因子(0 <α(t)<1)。两者α(t) 和σ(t)是随时间变化的单调递减函数，如下所示：

Do this learning process for each and every input data vector ( ‘P’ number of input data vectors). Then again do this same process again and again for T time iterations over the same input data vectors. After the T number of iterations, you will get a fully learned neuron matrix that maps our input data values.

对每个输入数据向量(输入数据向量的“ P ”个)执行此学习过程。然后，在相同的输入数据向量上，一次又一次地执行相同的过程，进行T次迭代。经过T次迭代后，您将获得一个完全学习的神经元矩阵，该矩阵映射我们的输入数据值。

📦MATLAB实现 (📦 MATLAB Implementation)

Using the above algorithm, a few interesting examples that have mentioned in Self-Organizing Maps Book by Teuvo Kohonen² have been implemented using MATLAB and you can clone it to your local computer as follows:

使用上述算法，已经使用MATLAB实现了TeuvoKohonen²的《自组织地图》中提到的一些有趣的示例，您可以按以下方式将其克隆到本地计算机：

git clone https://github.com/KosalaHerath/kohonen-som.git

Let’s define the repository’s home as <REPO_HOME>. Then, go to the following location and you can find three examples of MATLAB implementations and you can run them using any MATLAB version on your computer:

让我们将存储库的主目录定义为<REPO_HOME> 。然后，转到以下位置，您可以找到三个MATLAB实现示例，并且可以使用计算机上的任何MATLAB版本运行它们：

<REPO_HOME>/source/kohonen_examples

Otherwise, you can just click the following link and goto the implementation repository:

否则，您只需单击以下链接并转到实现存储库：

These examples are initiated with uniformly random distributed 2-dimensional (n=2) input data vectors. There were 1000 input data values (P=1000). In addition, we have defied the number of neurons as N = 10 x 10 = 100 and the number of iterations as T = 300. You can change these parameters and play with the model using the above implementations.

这些示例从均匀随机分布的二维( n = 2 )输入数据向量开始。有1000个输入数据值( P = 1000 )。另外，我们将神经元的数量定义为N = 10 x 10 = 100 ，迭代次数定义为T = 300 。您可以使用上述实现更改这些参数并使用模型。

📍示例1：平方输入分布 (📍Example 1: Square Input Distribution)

This example’s input data values are square shapely random distributed over the 2-dimensional space.

本示例的输入数据值是在二维空间上呈方形随机分布的形状。

SOM learning data representation for Square Input Distribution — Inputs are given in blue dot and model’s neuron values are given in red dots for epoch 1, 50, 250 and 300 (Image by Author) 平方输入分布的SOM学习数据表示-时期1、50、250和300的输入以蓝色圆点表示，模型的神经元值以红色圆点表示(图片由作者提供)

📍示例2：三角输入分布 (📍Example 2: Triangle Input Distribution)

This example’s input data values are triangle shapely random distributed over the 2-dimensional space.

此示例的输入数据值是在2维空间上呈三角形随机分布的形状。

SOM learning data representation for Triangle Input Distribution — Inputs are given in blue dot and model’s neuron values are given in red dots for epoch 1, 50, 250 and 300 (Image by Author) SOM学习数据表示法，用于三角形输入分布-对于第1、50、250和300历元，输入以蓝色圆点表示，模型的神经元值以红色圆点表示(图片由作者提供)

📍示例3：带有一维神经元阵列的三角形输入分布 (📍Example 3: Triangle Input Distribution with 1D neuron array)

This example’s input data values are square shapely random distributed over the 2-dimensional space and specifically, we consider a 1D neuron array instead of a 2D matrix. Therefore, the line was made by neuron array would try to cover all of the input data distribution as follows.

此示例的输入数据值是在2维空间内呈正方形形状随机分布的，具体来说，我们考虑使用1D神经元数组而不是2D矩阵。因此，这条线是由神经元阵列组成的，将尝试覆盖如下所示的所有输入数据分布。

SOM learning data representation for 1D neuron array — Inputs are given in blue dot and model’s neuron values are given in red dots for epoch 1, 50, 250 and 300 (Image by Author) 一维神经元阵列的SOM学习数据表示-时期1、50、250和300的输入以蓝色圆点表示，模型的神经元值以红色圆点表示(图片由作者提供)

Hence, now you can learn and play with these implementations with different inputs and modifications, and more you try will make you understand better. Please suggest any modifications that will improve these implementations from here.

因此，现在您可以通过不同的输入和修改来学习和使用这些实现，并且尝试更多的方法可以使您更好地理解。请从此处提出任何可以改善这些实现的修改。

Cheers! 🍺

干杯! 🍺

🗞参考 (🗞 References)

[1] Krogh, A. (2008). What are artificial neural networks? Nature Biotechnology, 26(2), pp.195–197.

[1] Krogh，A.(2008)。什么是人工神经网络？自然生物技术，26(2)，第195–197页。

[2] Teuvo Kohonen (2001). Self-organizing maps. New York Springer.

[2] Teuvo Kohonen(2001)。自组织地图。纽约施普林格。

翻译自: https://towardsdatascience.com/how-to-implement-kohonens-self-organizing-maps-989c4da05f19

相关资源：微信小程序源码-合集6.rar

Processed: 0.014, SQL: 8