空间可变高斯混合模型(拉普拉斯正则化 平滑)

以下推导,理解为在若干张图像中某一像素点的高斯混合模型推导过程。

空间可变高斯混合模型,基础理论认为所有采样点都来自K个不同的分布,但是每一个点上的权重不一样;现阶段采用的高斯混合模型,认为每一个点都有不同的分布,每个分布对应不同的权重。

混合分布是多个概率密度函数的线性组合,形式如下:

p(xΘ)=k=1Mαkpk(xθk),  andk=1Mαk=1(1.1)p(x \mid \Theta) = \sum_{k=1}^M \alpha_k \, p_k(x \mid \theta_k), \; \text{and} \sum_{k=1}^M \alpha_k = 1 \qquad (1.1)

对于 Θ=(α1,,αM,θ1,,θk)\Theta = (\alpha_1, \ldots, \alpha_M, \theta_1, \ldots, \theta_k),表示该混合分布由 MM 个分布构成,每个分布的权重为 αk\alpha_k。当每个分布均服从高斯分布时,则称该混合分布为 MM 个分布的高斯混合模型。

假设样本观测值为 X={xi},1iNX = \{ x_i \}, 1 \le i \le N,则由上式可知,高斯混合模型中每个高斯分布的概率密度函数 pk(xθk)p_k(x \mid \theta_k) 为:

pk(xθk)=12πσk2e(xiμk)22σk2(1.2)p_k(x \mid \theta_k) = \frac{1}{\sqrt{2\pi \sigma_k^2}} \, e^{- \frac{(x_i - \mu_k)^2}{2\sigma_k^2}} \qquad (1.2)

对于 NN 个独立同分布的样本观察值,它们的联合分布有:

p(XΘ)=i=1Np(xiΘ)(1.3)p(X \mid \Theta) = \prod_{i=1}^{N} p(x_i \mid \Theta) \qquad (1.3)

它们的对数似然函数可以写成:

lnL(ΘX)=lni=1Np(xiΘ)=i=1Nln[k=1Mαkpk(xiθk)](1.4)\ln L(\Theta \mid X) = \ln \prod_{i=1}^{N} p(x_i \mid \Theta) = \sum_{i=1}^{N} \ln \left[ \sum_{k=1}^{M} \alpha_k \, p_k(x_i \mid \theta_k) \right] \qquad (1.4)

我们的目的是求 Q=argmaxΘL(ΘX)Q = \arg\max_{\Theta} L(\Theta \mid X) 的极大值,从数学的基础理论上来分析,可以对上述对数似然函数进行求导,从而寻求极值。但是这是比较困难的任务,因为它含有对数函数和多项式求和。为了克服这个困难,引入隐变量 Y={yi},1iNY = \{ y_i \}, 1 \le i \le N,并且对于每个隐变量 yi{1,2,,M}y_i \in \{1, 2, \ldots, M\}yi=ky_i = k 时,表示第 ii 个样本观测值 xix_i 是高斯混合分布的第 kk 个分布产生的。因此,引入隐变量 YY 后,对数似然函数可以改写成为

lnL(ΘX,Y)=lni=1Np(xi,yiΘ)=i=1Nln[αyipyi(xiθyi)](1.5)\ln L(\Theta \mid X, Y) = \ln \prod_{i=1}^{N} p(x_i, y_i \mid \Theta) = \sum_{i=1}^{N} \ln \left[ \alpha_{y_i} \, p_{y_i}(x_i \mid \theta_{y_i}) \right] \qquad (1.5)

改写似然函数后,可以采用 EM 算法来对模型进行参数估计。

假设在第 t1t-1 次迭代开始,有 Θ\Theta 的估计 Θ^t1=(α1t1,,αMt1,θ1t1,,θMt1)\hat{\Theta}^{t-1} = (\alpha_1^{t-1}, \ldots, \alpha_M^{t-1}, \theta_1^{t-1}, \ldots, \theta_M^{t-1})

在 EM 算法的 E 步,求完全数据的对数似然函数的期望,如下:

Q(ΘΘt1)=E[lnL(ΘX,Y)]=yln[L(ΘX,y)]p(yX,Θt1)=yi=1Nln[αyipyi(xiθyi)]p(yxi,Θt1)(1.6)\begin{aligned} Q(\Theta \mid \Theta^{t-1}) &= \mathbb{E} \left[ \ln L(\Theta \mid X, Y) \right] \\ &= \sum_{y} \ln \left[ L(\Theta \mid X, y) \right] \, p(y \mid X, \Theta^{t-1}) \\ &= \sum_{y} \sum_{i=1}^{N} \ln \left[ \alpha_{y_i} \, p_{y_i}(x_i \mid \theta_{y_i}) \right] \, p(y \mid x_i, \Theta^{t-1}) \end{aligned} \qquad (1.6)

已知第 ii 个观测值 xix_i 来自第 kk 个分布的概率为 p(yi=kxi,Θt1)p(y_i = k \mid x_i, \Theta^{t-1})。因此上式可以写成:

Q(ΘΘt1)=k=1Mi=1Nln[αkpk(xiθk)]p(kxi,Θt1)=k=1Mi=1Nln(αk)p(kxi,Θt1)+k=1Mi=1Nln[pk(xiθk)]p(kxi,Θt1)(1.7)\begin{aligned} Q(\Theta \mid \Theta^{t-1}) &= \sum_{k=1}^{M} \sum_{i=1}^{N} \ln \left[ \alpha_k \, p_k(x_i \mid \theta_k) \right] \, p(k \mid x_i, \Theta^{t-1}) \\ &= \sum_{k=1}^{M} \sum_{i=1}^{N} \ln (\alpha_k) \, p(k \mid x_i, \Theta^{t-1}) + \sum_{k=1}^{M} \sum_{i=1}^{N} \ln \left[ p_k(x_i \mid \theta_k) \right] \, p(k \mid x_i, \Theta^{t-1}) \end{aligned} \qquad (1.7)

而由贝叶斯公式可知

p(kxi,Θt1)=p(k,xiθkt1)s=1Mp(s,xiθst1)=pk(xiθkt1)s=1Mps(xiθst1)(1.8)p(k \mid x_i, \Theta^{t-1}) = \frac{p(k, x_i \mid \theta_k^{t-1})} {\sum\limits_{s=1}^{M} p(s, x_i \mid \theta_s^{t-1})} = \frac{p_k(x_i \mid \theta_k^{t-1})} {\sum\limits_{s=1}^{M} p_s(x_i \mid \theta_s^{t-1})} \qquad (1.8)

其中,pk(xiθk)p_k(x_i \mid \theta_k) 由式 (1.2) 给出。

在对均值、方差和权重都加入空间正则化后,QQ 函数可以写成:

Q(ΘΘt1)=k=1Mi=1Nln(αk)p(kxi,Θt1)+k=1Mi=1Nln[pk(xiθk)]p(kxi,Θt1)λμk=1MN(i)(μk(i)μk(j))2λσk=1MN(i)(σk2(i)σk2(j))2λαk=1MN(i)(lnαk(i)lnαk(j))2(1.9)\begin{align} Q(\Theta \mid \Theta^{t-1}) &= \sum_{k=1}^{M} \sum_{i=1}^{N} \ln (\alpha_k) \, p(k \mid x_i, \Theta^{t-1}) + \sum_{k=1}^{M} \sum_{i=1}^{N} \ln \left[ p_k(x_i \mid \theta_k) \right] \, p(k \mid x_i, \Theta^{t-1}) \\ &- \lambda_\mu \sum_{k=1}^{M} \sum_{\mathcal{N}(i)} (\mu_k(i) - \mu_k(j))^2 - \lambda_\sigma \sum_{k=1}^{M} \sum_{\mathcal{N}(i)} (\sigma_k^2(i) - \sigma_k^2(j))^2 - \lambda_\alpha \sum_{k=1}^{M} \sum_{\mathcal{N}(i)} (\ln \alpha_k(i) - \ln \alpha_k(j))^2 \end{align} \qquad (1.9)

其中,λμ,λσ,λα\lambda_\mu, \lambda_\sigma, \lambda_\alpha 分别为均值、方差和权重的正则化系数,N(i)\mathcal{N}(i) 表示像素 jj 是像素 ii 的邻域。

首先,对均值 μk(i)\mu_k(i) 求偏导,可以将 Q(ΘΘt1)Q(\Theta \mid \Theta^{t-1})μk(i)\mu_k(i) 求偏导,并令其为零,有:

Q(ΘΘt1)μk(i)=μk(i)[i=1Nln[pk(xiθk)]p(kxi,Θt1)λμN(i)(μk(i)μk(j))2]=μk(i)[i=1N(xiμk(i))22σk2p(kxi,Θt1)]μk(i)[λμN(i)(μk(i)μk(j))2]=1σk2i=1N(xiμk(i))p(kxi,Θt1)2λμjN(i)(μk(i)μk(j))=0\begin{aligned} \frac{\partial Q(\Theta \mid \Theta^{t-1})}{\partial \mu_k(i)} &= \frac{\partial}{\partial \mu_k(i)} \Bigg[ \sum_{i=1}^{N} \ln \left[ p_k(x_i \mid \theta_k) \right] \, p(k \mid x_i, \Theta^{t-1}) - \lambda_\mu \sum_{\mathcal{N}(i)} (\mu_k(i) - \mu_k(j))^2 \Bigg] \\ &= \frac{\partial}{\partial \mu_k(i)} \left[ \sum_{i=1}^{N} - \frac{(x_i - \mu_k(i))^2}{2\sigma_k^2} \, p(k \mid x_i, \Theta^{t-1}) \right] - \frac{\partial}{\partial \mu_k(i)} \left[ \lambda_\mu \sum_{\mathcal{N}(i)} (\mu_k(i) - \mu_k(j))^2 \right] \\ &= \frac{1}{\sigma_k^2} \sum_{i=1}^{N} (x_i - \mu_k(i)) \, p(k \mid x_i, \Theta^{t-1}) - 2 \lambda_\mu \sum_{j \in \mathcal{N}(i)} (\mu_k(i) - \mu_k(j)) \\ &= 0 \end{aligned}

其中,N(i)\mathcal{N}(i) 表示像素 ii 的邻域像素集合。由此可以得到更新 μk(i)\mu_k(i) 的方程:

μk(i)=i=1Nxip(kxi,Θt1)+2λμσk2jN(i)μk(j)i=1Np(kxi,Θt1)+2λμσk2N(i)(1.10)\mu_k(i) = \frac{\sum\limits_{i=1}^{N} x_i \, p(k \mid x_i, \Theta^{t-1}) + 2 \lambda_\mu \sigma_k^2 \sum\limits_{j \in \mathcal{N}(i)} \mu_k(j)}{\sum\limits_{i=1}^{N} p(k \mid x_i, \Theta^{t-1}) + 2 \lambda_\mu \sigma_k^2 |\mathcal{N}(i)|} \qquad (1.10)

其中,N(i)|\mathcal{N}(i)| 表示像素 ii 的邻域像素数量。

接下来,对方差 σk2(i)\sigma_k^2(i) 求偏导,可以将 Q(ΘΘt1)Q(\Theta \mid \Theta^{t-1})σk2(i)\sigma_k^2(i) 求偏导,并令其为零,有:

Q(ΘΘt1)σk2(i)=σk2(i)[i=1Nln[pk(xiθk)]p(kxi,Θt1)λσjN(i)(σk2(i)σk2(j))2]=σk2(i)[i=1N(12ln(2πσk2(i))(xiμk(i))22σk2(i))p(kxi,Θt1)]σk2(i)[λσjN(i)(σk2(i)σk2(j))2]=i=1N[12σk2(i)+(xiμk(i))22(σk2(i))2]p(kxi,Θt1)2λσjN(i)(σk2(i)σk2(j))=0\begin{aligned} \frac{\partial Q(\Theta \mid \Theta^{t-1})}{\partial \sigma_k^2(i)} &= \frac{\partial}{\partial \sigma_k^2(i)} \Bigg[ \sum_{i=1}^{N} \ln \left[ p_k(x_i \mid \theta_k) \right] \, p(k \mid x_i, \Theta^{t-1}) - \lambda_\sigma \sum_{j \in \mathcal{N}(i)} (\sigma_k^2(i) - \sigma_k^2(j))^2 \Bigg] \\ &= \frac{\partial}{\partial \sigma_k^2(i)} \left[ \sum_{i=1}^{N} \left( -\frac{1}{2} \ln(2 \pi \sigma_k^2(i)) - \frac{(x_i - \mu_k(i))^2}{2 \sigma_k^2(i)} \right) \, p(k \mid x_i, \Theta^{t-1}) \right] - \frac{\partial}{\partial \sigma_k^2(i)} \left[ \lambda_\sigma \sum_{j \in \mathcal{N}(i)} (\sigma_k^2(i) - \sigma_k^2(j))^2 \right] \\ &= \sum_{i=1}^{N} \left[ -\frac{1}{2 \sigma_k^2(i)} + \frac{(x_i - \mu_k(i))^2}{2 (\sigma_k^2(i))^2} \right] p(k \mid x_i, \Theta^{t-1}) - 2 \lambda_\sigma \sum_{j \in \mathcal{N}(i)} (\sigma_k^2(i) - \sigma_k^2(j)) \\ &= 0 \end{aligned}

令其为 0:

i=1N[12σk2(i)+(xiμk(i))22(σk2(i))2]p(kxi,Θt1)=2λσjN(i)(σk2(i)σk2(j))\sum_{i=1}^{N} \left[-\frac{1}{2\sigma_k^2(i)} + \frac{(x_i - \mu_k(i))^2}{2 (\sigma_k^2(i))^2} \right] p(k \mid x_i, \Theta^{t-1}) = 2 \lambda_\sigma \sum_{j \in \mathcal{N}(i)} (\sigma_k^2(i) - \sigma_k^2(j))

为了便于求解,我们将其改写成标准形式:

i=1Np(kx_i,Θt1)(xiμk(i))2σk2(i)2(σk2(i))2=2λσjN(i)(σk2(i)σk2(j))\sum_{i=1}^{N} p(k \mid x\_i, \Theta^{t-1}) \frac{(x_i - \mu_k(i))^2 - \sigma_k^2(i)}{2 (\sigma_k^2(i))^2} = 2 \lambda_\sigma \sum_{j \in \mathcal{N}(i)} (\sigma_k^2(i) - \sigma_k^2(j))

两边同时乘以 2(σ_k2(i))22 (\sigma\_k^2(i))^2

i=1Np(kx_i,Θt1)[(xiμk(i))2σk2(i)]=4λσ(σk2(i))2jN(i)(1σk2(j)σk2(i))\sum_{i=1}^{N} p(k \mid x\_i, \Theta^{t-1}) \left[ (x_i - \mu_k(i))^2 - \sigma_k^2(i) \right] = 4 \lambda_\sigma (\sigma_k^2(i))^2 \sum_{j \in \mathcal{N}(i)} \left(1 - \frac{\sigma_k^2(j)}{\sigma_k^2(i)} \right)

这个方程是一个非线性方程,严格解析解比较复杂,因此通常采用迭代方式更新 σ_k2(i)\sigma\_k^2(i)

σk2(i)i=1Np(kx_i,Θt1)(xiμk(i))2+4λσjN(i)σk2(j)i=1Np(kx_i,Θt1)+4λσN(i)\sigma_k^2(i) \gets \frac{\sum\limits_{i=1}^{N} p(k \mid x\_i, \Theta^{t-1}) (x_i - \mu_k(i))^2 + 4 \lambda_\sigma \sum\limits_{j \in \mathcal{N}(i)} \sigma_k^2(j)}{\sum\limits_{i=1}^{N} p(k \mid x\_i, \Theta^{t-1}) + 4 \lambda_\sigma |\mathcal{N}(i)|}

由此可以得到更新 σk2(i)\sigma_k^2(i) 的方程:

σk2(i)=i=1Np(kxi,Θt1)(xiμk(i))2+4λσjN(i)σk2(j)i=1Np(kxi,Θt1)+4λσN(i)(1.11)\sigma_k^2(i) = \frac{\sum\limits_{i=1}^{N} p(k \mid x_i, \Theta^{t-1}) \, (x_i - \mu_k(i))^2 + 4 \lambda_\sigma \sum\limits_{j \in \mathcal{N}(i)} \sigma_k^2(j)}{\sum\limits_{i=1}^{N} p(k \mid x_i, \Theta^{t-1}) + 4 \lambda_\sigma |\mathcal{N}(i)|} \qquad (1.11)

最后,对权重 αk(i)\alpha_k(i) 求偏导,可以将 Q(ΘΘt1)Q(\Theta \mid \Theta^{t-1})lnαk(i)\ln \alpha_k(i) 求偏导,并令其为零,有:

Q(ΘΘt1)lnαk(i)=lnαk(i)[i=1Nln(αk(i))p(kxi,Θt1)λαjN(i)(lnαk(i)lnαk(j))2]=i=1Np(kxi,Θt1)2λαjN(i)(lnαk(i)lnαk(j))=0\begin{aligned} \frac{\partial Q(\Theta \mid \Theta^{t-1})}{\partial \ln \alpha_k(i)} &= \frac{\partial}{\partial \ln \alpha_k(i)} \Bigg[ \sum_{i=1}^{N} \ln (\alpha_k(i)) \, p(k \mid x_i, \Theta^{t-1}) - \lambda_\alpha \sum_{j \in \mathcal{N}(i)} (\ln \alpha_k(i) - \ln \alpha_k(j))^2 \Bigg] \\ &= \sum_{i=1}^{N} p(k \mid x_i, \Theta^{t-1}) - 2 \lambda_\alpha \sum_{j \in \mathcal{N}(i)} (\ln \alpha_k(i) - \ln \alpha_k(j)) \\ &= 0 \end{aligned}

由此可以得到更新 αk(i)\alpha_k(i) 的方程(经过指数映射保证非负并归一化):

αk(i)=exp(1N(i)jN(i)lnαk(j)+p(kxi,Θt1)2λαN(i))(1.12)\alpha_k(i) = \exp\Bigg(\frac{1}{|\mathcal{N}(i)|} \sum_{j \in \mathcal{N}(i)} \ln \alpha_k(j) + \frac{p(k \mid x_i, \Theta^{t-1})}{2 \lambda_\alpha |\mathcal{N}(i)|}\Bigg) \qquad (1.12)

对像素 i 的 M 个分布进行归一化,如下:

αk(i)=αk(i)l=1Mαl(i)\alpha_k(i) = \frac{\alpha_k(i)}{\sum\limits_{l=1}^{M} \alpha_l(i)}