高斯贝叶斯模型

科技2022-08-01 144

高斯贝叶斯模型

A quick practical guide to coding Gaussian mixture models in Infer.NET.

在Infer.NET中编码高斯混合模型的快速实用指南。

In this post, I will provide a brief introduction to Bayesian Gaussian mixture models and share my experience of building these types of models in Microsoft’s Infer.NET probabilistic graphical model framework. Being comfortable and familiar with k-means clustering and Python, I found it challenging to learn c#, Infer.NET and some of the underlying Bayesian principles used in probabilistic inference. My hope is that this article will save you time, remove any intimidation that the theory may bring and demonstrate some of the advantages of what is known as the Model-based machine learning (MBML) approach. Please follow the instructions provided in the Infer.NET documentation to get set up with the Infer.NET framework.

在本文中，我将简要介绍贝叶斯高斯混合模型，并分享我在Microsoft的Infer.NET概率图形模型框架中构建这些类型的模型的经验。熟悉并熟悉k-means聚类和Python，我发现学习c＃，Infer.NET和概率推断中使用的一些基本贝叶斯原理具有挑战性。我希望这篇文章可以节省您的时间，消除理论可能带来的任何恐吓，并证明所谓的基于模型的机器学习 (MBML)方法的某些优点。请按照Infer.NET文档中提供的说明进行操作设置Infer.NET框架。

Bayesian Gaussian mixture models constitutes a form of unsupervised learning and can be useful in fitting multi-modal data for tasks such as clustering, data compression, outlier detection, or generative classifiers. Each Gaussian component is usually a multivariate Gaussian with a mean vector and covariance matrix. For the sake of demonstration we will consider a simple univariate case.

贝叶斯高斯混合模型构成了无监督学习的一种形式，可用于拟合多模式数据以完成诸如聚类，数据压缩，离群值检测或生成分类器之类的任务。每个高斯分量通常是具有均值向量和协方差矩阵的多元高斯。为了演示，我们将考虑一个简单的单变量情况。

Let us sample data from a univariate Gaussian distribution and store the data in a .csv file using Python code:

让我们从单变量高斯分布中采样数据并将数据存储在。使用Python代码的csv文件：

This is what our data looks like:

我们的数据如下所示：

Left: a plot of 100 data points sampled from a Gaussian distribution with mean=5 and precision=10. Right: a histogram of the data. 左：从高斯分布中采样的100个数据点的图，均值= 5，精度= 10。右：数据的直方图。

Let us pretend for a moment we did not know the distribution that generated our data set. We visualise the data and make the assumption that the data was generated by a Gaussian distribution. In other words, we hope that a Gaussian distribution can sufficiently describe our data set. However, we do not know the location or the spread of this Gaussian distribution. A Gaussian distribution can be parameterised by a mean and variance parameter. Sometimes it is easier mathematically to use a mean and precision, where precision is simply the inverse of variance. We will stick with precision for which the intuition is that the higher the precision the narrower (or more “certain”) the spread of the Gaussian distribution.

让我们假装一会儿，我们不知道生成数据集的分布。我们将数据可视化，并假设数据是由高斯分布生成的。换句话说，我们希望高斯分布能够充分描述我们的数据集。但是，我们不知道这种高斯分布的位置或分布。高斯分布可以参数化由均值和方差参数。有时，在数学上使用均值和精度会更容易，其中精度只是方差的倒数。直觉上，我们将坚持精度，精度越高，高斯分布的分布越窄(或更“确定”)。

Firstly, we are interested in finding the mean parameter of this Gaussian distribution, and will pretend that we know the value of its precision (we set the precision=1). In other words, we think our data is Gaussian distributed and we are unsure what its mean parameter is, but we feel confident that it has a precision=1. Can we learn its mean parameter from the data? It turns out we need a second Gaussian distribution to depict the mean of our first Gaussian distribution. This is known as a conjugate prior. Here is a graphical representation of learning the unknown mean (using a Gaussian prior with parameters mean=0, precision=1):

首先，我们有兴趣寻找该高斯分布的均值参数，并假装我们知道其精度值(我们将precision设置为= 1)。换句话说，我们认为我们的数据是高斯分布的，我们不确定其平均参数是多少，但是我们有信心它的精度为1。我们可以从数据中了解其均值参数吗？事实证明，我们需要第二个高斯分布来描述第一个高斯分布的均值。这被称为共轭先验。这是学习未知均值的图形表示(使用高斯先验，参数均值= 0，精度= 1)：

Bayes network for learning the mean of data with known precision. 贝叶斯网络用于以已知的精度学习数据均值。

Notice the difference between the mean random variable and the known precision in the graph. Here is the code in Infer.NET:

注意图中的平均随机变量和已知精度之间的差异。这是Infer.NET中的代码：

Posterior Gaussian (Gaussian mean): Gaussian(4.928, 0.009901) 后高斯(高斯均值)：高斯(4.928，0.009901)

After observing only 100 data points we now have a posterior Gaussian distribution, which depicts the mean of our data x. We have learned something useful from the data! But wait… we can also learn something about its precision, without having to pretend it is fixed at 1. How? We do the same thing we did to the mean and place a distribution over the precision (effectively removing our “infinitely confident” knowledge that it was equal to 1 by replacing it with something resembling our “uncertainty”). The conjugate prior for precision is the Gamma distribution. We update our graphical representation of the model by including the Gamma distribution (with prior parameters shape=2, rate=1) over a new precision random variable:

在仅观察100个数据点之后，我们现在具有后高斯分布，该分布描述了我们的数据x的均值。我们从数据中学到了一些有用的东西！但是等等……我们还可以了解它的精度，而不必假装它的精度为1。我们对均值执行相同的操作，并在精度上进行分布(通过用类似于“不确定性”的值代替它来有效地消除我们的“无限自信”知识，即它等于1)。精度的共轭先验是Gamma分布。我们通过在新的精度随机变量上包括Gamma分布(先前参数shape = 2，rate = 1)来更新模型的图形表示形式：

Bayes network for learning the mean and precision of data. 贝叶斯网络用于学习数据的均值和精度。

Here is the code in Infer.NET:

这是Infer.NET中的代码：

Posterior Gamma (Gaussian precision): Gamma(52, 0.1499)[mean=7.797] 后伽玛(高斯精度)：伽玛(52，0.1499)[平均值= 7.797]

A recap of our assumptions up to this point (referring to the figures below):

总结一下到目前为止的假设(请参见下图)：

the data x is Gaussian distributed,

数据x是高斯分布的，

we pretended to have full knowledge of its precision (precision=1) and learned its mean by using a Gaussian prior,

我们假装完全了解其精度(precision = 1)，并通过使用高斯先验知识来了解其平均值， we then stopped pretending to “know” the precision and learned the precision by using a Gamma prior. Notice the difference this makes in the figure on the left below. The first model could not learn the precision due to the restriction we imposed (shown in green).

然后，我们不再假装“知道”精度，并通过使用Gamma优先级学习了精度。请注意，这在下面的左图中有所不同。由于我们施加的限制，第一个模型无法学习精度(以绿色显示)。 the parameters for an unknown mean and unknown precision are Gauss-Gamma distributed. After learning from the 100 data points, the prior distribution over these parameters (shown in red) updated to the posterior distribution (shown in blue in the figure on the right below).

均值和精度未知的参数是高斯-伽玛分布的。从100个数据点学习后，这些参数的先验分布(以红色显示)更新为后验分布(在右下图中以蓝色显示)。 Left: a Gaussian distribution with known precision (green) and unknown precision (blue). Right: a prior Gauss-Gamma distribution over the mean and precision parameters (red) and posterior distribution after learning from 100 data points (blue). 左：具有已知精度(绿色)和未知精度(蓝色)的高斯分布。右：从100个数据点学习后，均值和精度参数的先验Gauss-Gamma分布(红色)和后验分布(蓝色)。

Infer.NET can produce a Factor graph of our model when setting ShowFactorGraph = true. Factor nodes are shown in black boxes and variable nodes are shown in white boxes. This graph shows our data x (the observed variable array at the bottom), which depends on a Gaussian factor. The Gaussian factor depends on a random variable called mean, and a random variable called precision. These random variables depend on a Gaussian prior and a Gamma prior respectively. The parameter values of both prior distributions are shown at the top of the graph.

设置ShowFactorGraph = true时，Infer.NET可以生成我们模型的因子图。因子节点以黑框显示，变量节点以白框显示。此图显示了我们的数据x (底部观察到的变量数组)，它取决于高斯因子。高斯因子取决于称为均值的随机变量和称为精度的随机变量。这些随机变量分别取决于高斯先验和伽马先验。图的顶部显示了两个先验分布的参数值。

Infer.NET produced factor graph of our model. Infer.NET产生了我们模型的因子图。

We made certain assumptions in order to learn what the mean and precision of the Gaussian distribution are. In MBML, learning and inference is essentially the same. You can read more on the supported Infer.NET inference techniques and their differences here. The examples in this article make use of Variational message passing (VMP). In order to learn more complex distributions (i.e., multi-modal densities) this model will not be expressive enough and should be extended by introducing more assumptions. Ready to mix things up?

为了了解高斯分布的均值和精度，我们做了一些假设。在MBML中，学习和推理本质上是相同的。您可以在此处阅读有关受支持的Infer.NET推理技术及其差异的更多信息。本文中的示例使用变体消息传递 (VMP)。为了学习更复杂的分布(即多峰密度)，该模型将无法充分表现，应通过引入更多假设进行扩展。准备好混合了吗？

As with many things in life if one Gaussian is good, more should be better, right! First, we need a new data set and use the same Python code introduced at the start of the post. The only difference is that we set p=[0.4, 0.2, 0.4]. This means that 80% of the data should be sampled from the first and third Gaussian distribution, while 20% of the data should be sampled from the second. The data can be visualised:

就像生活中的许多事情一样，如果一个高斯是好的，那就更好了，对！首先，我们需要一个新的数据集，并使用在文章开头介绍的相同Python代码。唯一的区别是我们将p设置为[0.4，0.2，0.4]。这意味着应从第一和第三高斯分布中采样80％的数据，而应从第二和第三高斯分布中采样20％的数据。数据可以可视化：

Left: a plot of 100 data points sampled from three different Gaussian distributions with means=[1, 3, 5] and precisions=[10, 10, 10]. Right: a histogram of the data. 左：从三个不同的高斯分布中采样的100个数据点的图，均值= [1、3、5]，精度= [10、10、10]。右：数据的直方图。

To create the model we will use k=3 number of Gaussian distributions, also known as components, to fit our data set. In other words, we have three mean random variables and three precision random variables that we need to learn, but we also need a latent random variable z. This random variable has a discrete distribution and is responsible for selecting the components that best describe its associated observed x value. For example, more weight should be assigned to state one of z₀ if the observed x₀ is best explained by Gaussian component one. In this example, we will pretend to know the mixture weights responsible for all data points and use a uniform assignment (w₀=1/3, w₁=1/3, w₂=1/3) as shown in the graph below.

为了创建模型，我们将使用k = 3的高斯分布(也称为分量)来拟合我们的数据集。换句话说，我们需要学习三个均值随机变量和三个精度随机变量，但是我们还需要一个潜在随机变量z 。该随机变量具有离散分布，并负责选择最能描述其相关观测x值的组件。例如，如果用高斯分量一最好地解释了观测到的x₀ ，则应为z state的状态之一分配更多的权重。在本例中，我们将假装知道负责所有数据点的混合权重，并使用统一分配(w 1 = 1/3，w 1 = 1/3，w 2 = 1/3)，如下图所示。

Bayes network for learning a mixture of Gaussian distributions with known mixture weights. 贝叶斯网络用于学习具有已知混合权重的高斯分布的混合。

Here is the code in Infer.NET:

这是Infer.NET中的代码：

Posterior Gamma (Gaussian precision): Gamma(25.43, 0.2547)[mean=6.477] 后伽玛(高斯精度)：伽玛(25.43，0.2547)[平均值= 6.477] Posterior Gaussian (Gaussian mean): Gaussian(5.045, 0.004061) 后高斯(高斯平均)：高斯(5.045，0.004061) Posterior Gamma (Gaussian precision): Gamma(17, 0.4812)[mean=8.178] 后伽玛(高斯精度)：伽玛(17，0.4812)[平均值= 8.178] Posterior Gaussian (Gaussian mean): Gaussian(2.889, 0.007502) 后高斯(高斯平均)：高斯(2.889，0.007502) Posterior Gamma (Gaussian precision): Gamma(13.58, 0.4209)[mean=5.715] 后伽玛(高斯精度)：伽玛(13.58，0.4209)[平均值= 5.715]

The three Gaussian distributions/components learned from the data are plotted below:

从数据中学到的三个高斯分布/成分如下图所示：

The three learned Gaussian distributions and their sum with known weights set to 1/3. 这三个学习的高斯分布及其已知权重的和设置为1/3。

Hold on…we claimed to have full knowledge of the component weights, but what if we do not? Can we also learn the weights from the data? Indeed, but we need a prior! A Dirichlet distribution is the conjugate prior for the discrete/categorical distribution. The graph below is updated showing a random variable for the unknown weights with its accompanying Dirichlet prior.

等等...我们声称对零件重量有充分的了解，但是如果我们不知道该怎么办？我们还可以从数据中学习权重吗？确实，但是我们需要事前准备！ Dirichlet分布是离散/分类分布的共轭先验。下图已更新，其中显示了未知权重的随机变量及其附带的Dirichlet优先级。

Bayes network for learning a mixture of Gaussian distributions with unknown mixture weights. 贝叶斯网络用于学习混合权重未知的高斯分布的混合。

Here is the code in Infer.NET:

这是Infer.NET中的代码：

Posterior Gamma (Gaussian precision): Gamma(25.04, 0.2663)[mean=6.667] 后伽玛(高斯精度)：伽玛(25.04，0.2663)[平均值= 6.667] Posterior Gaussian (Gaussian mean): Gaussian(2.719, 0.02028) 后高斯(高斯平均)：高斯(2.719，0.02028) Posterior Gamma (Gaussian precision): Gamma(10.1, 0.3655)[mean=3.693] 后伽玛(高斯精度)：伽玛(10.1，0.3655)[平均值= 3.693] Posterior Gaussian (Gaussian mean): Gaussian(4.513, 0.02233) 后高斯(高斯平均)：高斯(4.513，0.02233) Posterior Gamma (Gaussian precision): Gamma(20.86, 0.06266)[mean=1.307] 后伽马(高斯精度)：伽马(20.86，0.06266)[平均值= 1.307] Posterior weight distribution: 0.4563 0.168 0.3757 后重量分布：0.4563 0.168 0.3757

The three Gaussian distributions/components learned from the data and their learned weights are illustrated below:

从数据中学到的三个高斯分布/分量及其学习的权重如下所示：

The three learned Gaussian distributions and their sum with unknown weights = {0.45, 0.16, 0.37}. 这三个学习的高斯分布及其权重未知的和= {0.45，0.16，0.37}。

In summary, we started this journey by assuming our first data set can sufficiently be described by the Gaussian distribution. We were able to learn the mean parameters and the precision parameters of the Gaussian distribution using the observed data and VMP inference in Infer.NET. Our second data set used a more complex generating mechanism, which requires a more expressive model. We then introduced a latent variable z and Dirichlet prior, which allows us to learn mixtures of Gaussian distributions and their mixture weights. All steps are provided in c# code using Infer.NET and can be accessed here.

总而言之，我们通过假设我们的第一个数据集可以用高斯分布充分描述来开始这一旅程。我们能够使用观察到的数据和Infer.NET中的VMP推理来学习高斯分布的均值参数和精度参数。我们的第二个数据集使用了更复杂的生成机制，这需要一个更具表现力的模型。然后，我们先介绍了一个潜在变量z和Dirichlet，这使我们可以学习高斯分布的混合及其混合权重。所有步骤都使用Infer.NET以c＃代码提供，可以在此处进行访问。

For a more formal treatment the following books and links come recommended:

为了获得更正式的待遇，建议使用以下书籍和链接：

https://dotnet.github.io/infer/InferNet101.pdf

Bishop, Christopher M. Pattern recognition and machine learning. springer, 2006.

主教，克里斯托弗·M。模式识别和机器学习。施普林格，2006年。

Murphy, Kevin P. Machine learning: a probabilistic perspective. MIT press, 2012.

墨菲，凯文·P。《机器学习：概率论》。麻省理工学院出版社，2012年。

http://mbmlbook.com/index.html

http://www.jmlr.org/papers/volume6/winn05a/winn05a.pdf

https://en.wikipedia.org/wiki/Exponential_family

https://zh.wikipedia.org/wiki/Exponential_family

Important concepts that were not mentioned in this post:

这篇文章中未提及的重要概念：

appropriate prior parameters,

适当的先验参数， identifiability,

可识别性 breaking symmetry,

打破对称， message-passing (variational message passing (VMP) & expectation propagation (EP)),

消息传递(可变消息传递(VMP)和期望传播(EP))， Wishart conjugate prior of the inverse covariance-matrix,

逆协方差矩阵之前的Wishart共轭 Occam’s razor and the Dirichlet prior.

奥卡姆的剃刀和Dirichlet之前。

翻译自: https://medium.com/@jacowp357/bayesian-gaussian-mixture-models-without-the-math-using-infer-net-7767bb7494a0

高斯贝叶斯模型

相关资源：微信小程序源码-合集6.rar

Processed: 0.010, SQL: 8