GMM算法

    科技2025-10-10  17

    高斯混合模型

    高斯混合模型是一种生成模型,从几何角度来看,可视为数据的概率密度函数为多个高斯分布的加权平均,即: p ( x ) = ∑ k = 1 K α k N ( x ∣ μ k , Σ k ) ∑ k = 1 K α k = 1 \begin{aligned}p(x)=\sum_{k=1}^{K}&\alpha_kN(x|\mu_k,\Sigma_k)\\ \sum_{k=1}^K\alpha_k &= 1 \end{aligned} p(x)=k=1Kk=1KαkαkN(xμk,Σk)=1

    从混合模型上看,x为observed variable,而z为latent variable,z代表了变量x是属于哪一个高斯分布。则 p ( x ) = ∑ z p ( x , z ) = ∑ k = 1 K p ( x , z = c k ) = ∑ k = 1 K p ( z = c k ) p ( x ∣ z = c k ) = ∑ k = 1 K p k N ( x ∣ μ k , Σ k ) \begin{aligned} p(x)&=\sum_zp(x,z)\\ &=\sum_{k=1}^Kp(x,z=c_k)\\ &=\sum_{k=1}^Kp(z=c_k)p(x|z=c_k)\\ &=\sum_{k=1}^Kp_kN(x|\mu_k,\Sigma_k) \end{aligned} p(x=zp(x,z)=k=1Kp(x,z=ck)=k=1Kp(z=ck)p(xz=ck)=k=1KpkN(xμk,Σk)

    在高斯混合模型中,我们需要计算的参数为 p k , μ k , Σ k p_k,\mu_k,\Sigma_k pk,μk,Σk,即 θ = { p 1 , p 2 , … , p K , μ 1 , μ 2 , … , μ K , Σ 1 , Σ 2 , … , Σ K } \theta=\{p_1,p_2,…,p_K,\mu_1,\mu_2,…,\mu_K,\Sigma_1,\Sigma_2,…,\Sigma_K\} θ={p1,p2,,pK,μ1,μ2,,μK,Σ1,Σ2,,ΣK}。设观测数据为 X = { x 1 , x 2 , … , x N } X=\{x_1,x_2,…,x_N\} X={x1,x2,,xN} ( X , Z ) (X,Z) (X,Z)为完整数据。如果我们希望利用极大似然法来对参数进行求解,则计算公式为 θ ^ = arg max ⁡ θ log ⁡ P ( X ) = arg max ⁡ θ log ⁡ ∏ i = 1 N P ( x ) = arg max ⁡ θ ∑ i = 1 N log ⁡ ∑ k = 1 K p k N ( x ∣ μ k , Σ k ) \begin{aligned} \hat{\theta}&=\argmax_{\theta}\log P(X)\\ &=\argmax_{\theta}\log\prod_{i=1}^NP(x)\\ &=\argmax_{\theta}\sum_{i=1}^N\log \sum_{k=1}^Kp_kN(x|\mu_k,\Sigma_k) \end{aligned} θ^=θargmaxlogP(X)=θargmaxlogi=1NP(x)=θargmaxi=1Nlogk=1KpkN(xμk,Σk)

    然后,由于上式 log ⁡ \log log中含有加法,所以如果对其直接进行求导计算参数 θ \theta θ,无法得到解析解。因此,我们采用EM算法来迭代的对高斯混合模型中的参数进行求解。

    EM - E-step

    EM算法: θ t + 1 = arg max ⁡ θ E Z ∣ X , θ t [ log ⁡ P ( X , Z ∣ θ ) ] = arg max ⁡ e θ Q ( θ , θ t ) \begin{aligned} \theta^{t+1}&=\argmax_{\theta}E_{Z|X,\theta^{t}}\left[\log P(X,Z|\theta)\right]\\ &=\argmax_e{\theta}Q(\theta,\theta^{t}) \end{aligned} θt+1=θargmaxEZX,θt[logP(X,Zθ)]=eargmaxθQ(θ,θt)

    其中 Q ( θ , θ t ) = ∫ z log ⁡ P ( X , Z ∣ θ ) P ( Z ∣ X , θ t ) d Z = ∑ z 1 , z 2 , … , z k log ⁡ ∏ i = 1 N P ( x i , z i ∣ θ ) ∏ i = 1 N P ( z i ∣ x i , θ t ) = ∑ z 1 , z 2 , … , z k ( ∑ i = 1 N log ⁡ P ( x i , z i ∣ θ ) ) ∏ i = 1 N P ( z i ∣ x i , θ t ) = ∑ i = 1 N ∑ z 1 , z 2 , … , z k log ⁡ P ( x i , z i ∣ θ ) ∏ i = 1 N P ( z i ∣ x i , θ t ) = ∑ i = 1 N ∑ z i log ⁡ P ( x i , z i ∣ θ ) P ( z i ∣ x i , θ t ) = ∑ i = 1 N ∑ z i log ⁡ p z i N ( x i ∣ μ k , Σ k ) p z i t N ( x i ∣ μ k t , Σ k t ) ∑ k = 1 K p k t N ( x ∣ μ k t , Σ k t ) = ∑ i = 1 N ∑ z i ( log ⁡ p z i + log ⁡ N ( x i ∣ μ k , Σ k ) ) p z i t N ( x i ∣ μ k t , Σ k t ) ∑ k = 1 K p k t N ( x ∣ μ k t , Σ k t ) = ∑ i = 1 N ∑ k = 1 K ( log ⁡ p k + log ⁡ N ( x i ∣ μ k , Σ k ) ) P ( z i = c k ∣ x i , θ t ) \begin{aligned} Q(\theta,\theta^t)&=\int_z\log P(X,Z|\theta)P(Z|X,\theta^t)dZ\\ &=\sum_{z_1,z_2,…,z_k}\log \prod_{i=1}^NP(x_i,z_i|\theta) \prod_{i=1}^NP(z_i|x_i,\theta^t)\\ &=\sum_{z_1,z_2,…,z_k}\left(\sum_{i=1}^N\log P(x_i,z_i|\theta)\right)\prod_{i=1}^NP(z_i|x_i,\theta^t)\\ &=\sum_{i=1}^N\sum_{z_1,z_2,…,z_k} \log P(x_i,z_i|\theta)\prod_{i=1}^NP(z_i|x_i,\theta^t)\\ &=\sum_{i=1}^N\sum_{z_i} \log P(x_i,z_i|\theta)P(z_i|x_i,\theta^t)\\ &=\sum_{i=1}^N\sum_{z_i} \log p_{z_i}N(x_i|\mu_k,\Sigma_k)\frac{p_{z_i}^tN(x_i|\mu_k^t,\Sigma_k^t)}{\sum_{k=1}^Kp_k^tN(x|\mu^t_k,\Sigma^t_k)}\\ &=\sum_{i=1}^N\sum_{z_i}\left(\log p_{z_i}+\log N(x_i|\mu_k,\Sigma_k)\right)\frac{p_{z_i}^tN(x_i|\mu_k^t,\Sigma_k^t)}{\sum_{k=1}^Kp_k^tN(x|\mu^t_k,\Sigma^t_k)}\\ &=\sum_{i=1}^N\sum_{k=1}^K(\log p_k+\log N(x_i|\mu_k,\Sigma_k))P(z_i=c_k|x_i,\theta^t) \end{aligned} Q(θ,θt)=zlogP(X,Zθ)P(ZX,θt)dZ=z1,z2,,zklogi=1NP(xi,ziθ)i=1NP(zixi,θt)=z1,z2,,zk(i=1NlogP(xi,ziθ))i=1NP(zixi,θt)=i=1Nz1,z2,,zklogP(xi,ziθ)i=1NP(zixi,θt)=i=1NzilogP(xi,ziθ)P(zixi,θt)=i=1NzilogpziN(xiμk,Σk)k=1KpktN(xμkt,Σkt)pzitN(xiμkt,Σkt)=i=1Nzi(logpzi+logN(xiμk,Σk))k=1KpktN(xμkt,Σkt)pzitN(xiμkt,Σkt)=i=1Nk=1K(logpk+logN(xiμk,Σk))P(zi=ckxi,θt)

    其中 p z i t N ( x i ∣ μ k t , Σ k t ) ∑ k = 1 K p k t N ( x ∣ μ k t , Σ k t ) = P ( z i ∣ x i , θ t ) \frac{p_{z_i}^tN(x_i|\mu_k^t,\Sigma_k^t)}{\sum_{k=1}^Kp_k^tN(x|\mu^t_k,\Sigma^t_k)}=P(z_i|x_i,\theta^t) k=1KpktN(xμkt,Σkt)pzitN(xiμkt,Σkt)=P(zixi,θt) θ \theta θ无关。

    我们先求 p k ( t + 1 ) p_k^{(t+1)} pk(t+1) { p k ( t + 1 ) = arg max ⁡ p k ∑ i = 1 N ∑ k = 1 K log ⁡ p k P ( z i = c k ∣ x i , θ t ) ∑ k = 1 K p k = 1 \begin{aligned} \left\{ \begin{array}{lr} p_k^{(t+1)}=\argmax_{p_k}\sum\limits_{i=1}^N\sum\limits_{k=1}^K\log p_k P(z_i=c_k|x_i,\theta^t) &\\ \sum\limits_{k=1}^K p_k=1 & \end{array} \right. \end{aligned} pk(t+1)=pkargmaxi=1Nk=1KlogpkP(zi=ckxi,θt)k=1Kpk=1 将上述带约束的优化问题化为拉格朗日方程 L ( p k . λ ) = ∑ i = 1 N ∑ k = 1 K log ⁡ p k P ( z i = c k ∣ x i , θ t ) + λ ( ∑ k = 1 K p k − 1 ) \begin{aligned} \mathcal{L}(p_k.\lambda)=\sum_{i=1}^N\sum_{k=1}^K\log p_k P(z_i=c_k|x_i,\theta^t) +\lambda(\sum_{k=1}^Kp_k-1)\\ \end{aligned} L(pk.λ)=i=1Nk=1KlogpkP(zi=ckxi,θt)+λ(k=1Kpk1) ∂ L ( p k , λ ) ∂ p k = ∑ i = 1 N 1 p k P ( z i = c k ∣ x i , θ t ) + λ = 0 \begin{aligned} \frac{\partial \mathcal{L}(p_k,\lambda)}{\partial p_k}=\sum_{i=1}^N\frac{1}{p_k} P(z_i=c_k|x_i,\theta^t) +\lambda=0 \end{aligned} pkL(pk,λ)=i=1Npk1P(zi=ckxi,θt)+λ=0 两边同时对k求和有 ∑ k = 1 K ∑ i = 1 N P ( z i = c k ∣ x i , θ t ) + ∑ k = 1 K λ p k = 0 \begin{aligned} \sum_{k=1}^K\sum_{i=1}^NP(z_i=c_k|x_i,\theta^t) +\sum_{k=1}^K\lambda p_k=0\\ \end{aligned} k=1Ki=1NP(zi=ckxi,θt)+k=1Kλpk=0 λ = − N \lambda=-N λ=N,则 p k = ∑ i = 1 N P ( z i = c k ∣ x i , θ t ) N p_k=\frac{\sum\limits_{i=1}^NP(z_i=c_k|x_i,\theta^t)}{N} pk=Ni=1NP(zi=ckxi,θt)。而 μ t , Σ k \mu_t,\Sigma_k μt,Σk的求解,因为无约束,所以可直接对 Q ( θ , θ t ) Q(\theta,\theta^t) Q(θ,θt)求导。

    Processed: 0.014, SQL: 8