Lecture 13. em Algorithm (After-Class)
Lecture 13. em Algorithm (After-Class)
EM Algorithm (After-class)
Notes: As we saw before, many estimation problems require maximization of the probability
distribution with respect to an unknown parameter, for example when computing ML estimates of the
parameters or MAP estimates of the hidden random variables. For many interesting problems,
differentiating the probability distribution with respect to the parameter of interest and setting the
derivative to zero results in a nonlinear equation that does not have a closed-form solution. In such
cases, we have to resort to numerical optimization.
y=
PY ∣W (y∣w) = { w
ϵ
1−ϵ y=w
Let w = [w1 , ⋯ , wn ]T be n i.i.d. samples of w .
y = [y1 , ⋯ , yn ]T
n
ML: PY (y; ϵ, δ) = ∏i=1 PYi (yi ; ϵ, δ)
n
= ∏ [PYi ∣Wi (yi ∣wi = 0; ϵ, δ)PWi (0; ϵ, δ) + PYi ∣Wi (yi ∣wi = 1; ϵ, δ)PWi (1; ϵ, δ)]
i=1
latent r.v.
General setup
Assume complete data z , generate by PZ (⋅; x), and x is the parameter to estimate.
′ ′ ′ ′
⇒ If we can find x =
x such that U(x, x ) ≥ U(x , x ), then
log PY (y; x) = U(x, x′ ) + V (x, x′ ) ≥ U(x′ , x′ ) + V (x′ , x′ ) = log PY (y; x′ )
^(n) , compute
E-step: given the previous estimation x
^(n+1) , x
⇒ U(x ^(n) ) ≥ U(x
^(n) , x
^(n) )
^(0) , x
We can get a sequence of x ^(1) , ⋯ such that
^(0) ) ≤ PY (y; x
PY (y; x ^(1) ) ≤ ⋯
⎧ϕ = [ϕ1 , ⋯ , ϕk ]
x = [ϕ, μ, Σ] : ⎨μ = [μ1 , ⋯ , μk ]
⎩Σ = [Σ , ⋯ , Σ ]
1 k
z = [l1 , ⋯ , lm , y1 , ⋯ , ym ]
y = [y1 , ⋯ , ym ]
The EM Algorithm:
E-step:
M-step:
^(n) ) ∀x
⇒ Compute U(x, x
m k
^ ) = ∑ ∑ wij log P (li = j, yi ; x)
U(x, x (n)
i=1 j=1
m k
= ∑ ∑ wij (log P (li = j; x) + log P (yi ∣li = j; x))
i=1 j=1
m k
1 1
= ∑ ∑ wij (log ϕj − log(2π)d ∣Σj ∣ − (yi − μj )T Σ−1
j (yi − μj ))
i=1 j=1
2 2
find x∗ ^(n) )
= arg max U(x, x
x
M-step: updating parameters
k
Do the derivatives on ϕj . Note that ∑j=1 ϕj = 1. Use the Lagrangian multiplier method.
^(n) ) − λ(∑j=1 ϕj − 1) ∣
k
∂U(x, x m
∑i=1 wij
= −λ=0
∂ϕj ^
ϕ
(n+1)
∣ϕj =ϕ^j
(n+1)
j
k
By ∑j=1 ϕj = 1, λ = ∑m k
i=1 ∑j=1 wij = m, and then
m
1
ϕ^j ∑ wij
(n+1)
=
m
i=1
Do the derivatives on μj
Then, we have
m
(n+1) ∑ wij ⋅ yi
^j
μ = i=1m
∑i=1 wij
Do the derivatives on Σj
^(n) ) ∣
∂U(x, x
∂Σj ∣Σj =Σ^(n+1) ,μj =μ^(n+1)
j j
m
1 ^ (n+1)−1 ^ (n+1)−1
= − ∑ wij (Σ
(n) (n) T ^ (n+1)−1
j − Σ j (yi − ^
μ j )(yi − ^
μ j ) Σj )=0
i=1
2
m
⇒ ∑ wij (Σ
^ (n+1) − (yi − μ(n) (n)
j ^j )T ) = 0
^j )(yi − μ
i=1
Then, we have
(n) (n)
∑m
i=1 wij (yi − μ
^j )(yi − μ
^j )T
^ (n+1)
Σ =
j
∑mi=1 wij