0% found this document useful (0 votes)

33 views

The Expectation Maximization Algorithm

The document provides an explanation of the Expectation Maximization algorithm including its intuitive explanation and derivation based on lower-bound maximization. It also includes graphical examples to provide intuition about the algorithm.

Uploaded by

Chris

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views

The Expectation Maximization Algorithm

Uploaded by

Chris

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

The Expectation Maximization Algorithm

Frank Dellaert

College of Computing, Georgia Institute of Technology

Technical Report number GIT-GVU-02-20
February 2002

Abstract
This note represents my attempt at explaining the EM algorithm (Hartley, 1958;
Dempster et al., 1977; McLachlan and Krishnan, 1997). This is just a slight
variation on Tom Minka’s tutorial (Minka, 1998), perhaps a little easier (or perhaps
not). It includes a graphical example to provide some intuition.

1 Intuitive Explanation of EM
EM is an iterative optimization method to estimate some unknown parameters Θ, given
measurement data U. However, we are not given some “hidden” nuisance variables J,
which need to be integrated out. In particular, we want to maximize the posterior
probability of the parameters Θ given the data U, marginalizing over J:
X
Θ∗ = argmax P (Θ, J|U) (1)
Θ J∈J n

The intuition behind EM is an old one: alternate between estimating the unknowns
Θ and the hidden variables J. This idea has been around for a long time. However,
instead of finding the best J ∈ J given an estimate Θ at each iteration, EM computes a
distribution over the space J . One of the earliest papers on EM is (Hartley, 1958), but
the seminal reference that formalized EM and provided a proof of convergence is the
“DLR” paper by Dempster, Laird, and Rubin (Dempster et al., 1977). A recent book
devoted entirely to EM and applications is (McLachlan and Krishnan, 1997), whereas
(Tanner, 1996) is another popular and very useful reference.
One of the most insightful explanations of EM, that provides a deeper understanding
of its operation than the intuition of alternating between variables, is in terms of lower-
bound maximization (Neal and Hinton, 1998; Minka, 1998). In this derivation, the
E-step can be interpreted as constructing a local lower-bound to the posterior distribu-
tion, whereas the M-step optimizes the bound, thereby improving the estimate for the
unknowns. This is demonstrated below for a simple example.

1
0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

−0.1
−5 −4 −3 −2 −1 0 1 2 3 4 5

Figure 1: EM example: Mixture components and data. The data consists of three
samples drawn from each mixture component, shown above as circles and triangles.
The means of the mixture components are −2 and 2, respectively.

0.5

0.4

0.3

0.2

0.1

0
3
2
3
1 2
0 1
−1 0
−1
−2 −2
−3 −3
θ2
θ1

Figure 2: The true likelihood function of the two component means θ 1 and θ2 , given
the data in Figure 1.

2
i=1, Q=−3.279564

0.5

0.4

0.3

0.2

0.1

0
3
2
3
1 2
0 1
−1 0
−1
−2 −2
−3 −3
θ2
θ1
0.5

1.5

2.5
1 2 3 4 5 6

i=2, Q=−2.788156 i=3, Q=−1.501116

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0
3 3
2 2
3 3
1 2 1 2
0 1 0 1
−1 0 −1 0
−1 −1
−2 −2 −2 −2
−3 −3 −3 −3
θ2 θ2
θ1 θ1
0.5 0.5

1 1

1.5 1.5

2 2

2.5 2.5
1 2 3 4 5 6 1 2 3 4 5 6

i=4, Q=−0.817491 i=5, Q=−0.762661

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0
3 3
2 2
3 3
1 2 1 2
0 1 0 1
−1 0 −1 0
−1 −1
−2 −2 −2 −2
−3 −3 −3 −3
θ2 θ2
θ1 θ1
0.5 0.5

1 1

1.5 1.5

2 2

2.5 2.5
1 2 3 4 5 6 1 2 3 4 5 6

Figure 3: Lower Bounds

3
Consider the mixture estimation problem shown in Figure 1, where the goal is to es-
timate the two component means θ1 and θ2 given 6 samples drawn from the mixture,
but without knowing from which mixture each sample was drawn. The state space is
two-dimensional, and the true likelihood function is shown in Figure 2. Note that there
are two modes, located respectively at (−2, 2) and (2, −2). This makes perfect sense,
as we can switch the mixture components without affecting the quality of the solution.
Note also that the true likelihood is computed by integrating over all possible data as-
sociations, and hence we can find a maximum likelihood solution without solving a
correspondence problem. However, even for only 6 samples, this requires summing
over the space of 64 possible data-associations.
EM proceeds as follows in this example. In the E-step, a “soft” assignment is computed
that assigns a posterior probability to each possible association of each individual sam-
ple. In the current example, there are 2 mixtures and 6 samples, so the computed prob-
abilities can be represented in a 2 × 6 table. Given these probabilities, EM computes a
tight lower bound to the true likelihood function of Figure 2. The bound is constructed
such that it touches the likelihood function at the current estimate, and it is only close to
the true likelihood in the neighborhood of this estimate. The bound and its correspond-
ing probability table is computed in each iteration, as shown in Figure 3. In this case,
EM was run for 5 iterations. In the M-step, the lower bound is maximized (shown by a
black asterisk in the figure), and the corresponding new estimate (θ 1 , θ2 ) is guaranteed
to lie closer to the location of the nearest local maximum of the likelihood. Each next
bound is an increasingly better approximation to the mode of the likelihood, until at
convergence the bound touches the likelihood at the local maximum, and progress can
no longer be made. This is shown in the last panel of Figure 3.

2 EM as Lower Bound Maximization

EM can be derived in many different ways, one of the most insightful being in terms of
lower bound maximization (Neal and Hinton, 1998; Minka, 1998), as illustrated with
the example from Section 1. In this section, we derive the EM algorithm on that basis,
closely following (Minka, 1998).
The goal is to maximize the posterior probability (1) of the parameters Θ given the
data U, in the presence of hidden data J. Equivalently, we can maximize the logarithm
of the joint distribution (which is proportional to the posterior):
X
Θ∗ = argmax log P (U, Θ) = argmax log P (U, J, Θ) (2)
Θ Θ J∈J n

The idea behind EM is to start with a guess Θt for the parameters Θ, compute an
easily computed lower bound B(Θ; Θt ) to the function log P (Θ|U), and maximize
that bound instead. If iterated, this procedure will converge to a local maximizer Θ ∗ of
the objective function, provided the bound improves at each iteration.
To motivate this, note that the key problem with maximizing (2) is that it involves the
logarithm of a (big) sum, which is difficult to deal with. Fortunately, we can construct

4
a tractable lower bound B(Θ; Θt ) that instead contains a sum of logarithms. To derive
the bound, first trivially rewrite log P (U, Θ) as
X X P (U, J, Θ)
log P (U, Θ) = log P (U, J, Θ) = log f t (J)
f t (J)
J∈J n J∈J n

where f t (J) is an arbitrary probability distribution over the space J n of hidden vari-
ables J. By Jensen’s inequality, we have
∆
X P (U, J, Θ) X P (U, J, Θ)
B(Θ; Θt ) = f t (J) log <= log f t (J)
f t (J) f t (J)
J∈J n n J∈J

Note that we have transformed a log of sums into a sum of logs, which was the prime
motivation.

2.1 Finding an Optimal Bound

EM goes one step further and tries to find the best bound, defined as the bound B(Θ; Θ t )
that touches the objective function log P (U, Θ) at the current guess Θ t . Intuitively,
finding the best bound at each iteration will guarantee that we obtain an improved es-
timate Θt+1 when we locally maximize the bound with respect to Θ. Since we know
B(Θ; Θt ) to be a lower bound, the optimal bound at Θt can be found by maximizing
X P (U, J, Θt )
B(Θt ; Θt ) = f t (J) log (3)
f t (J)
J∈J n

with respect toPthe distribution f t (J). Introducing a Lagrange multiplier λ to enforce

the constraint J∈J n f t (J) = 1, the objective becomes
" #
X X X
t t
G(f ) = λ 1 − f (J) + f t (J) log P (U, J, Θt ) − f t (J) log f t (J)
J∈J n J∈J n J∈J n

Taking the derivative

∂G
= −λ + log P (U, J, Θt ) − log f t (J) − 1
∂f t (J)
and solving for f t (J) we obtain
P (U, J, Θt )
f t (J) = P = P (J|U, Θt )
J∈J n P (U, J, Θ )
t

By examining the value of the resulting optimal bound at Θt we see that it indeed
touches the objective function:
X P (U, J, Θt )
B(Θt ; Θt ) = P (J|U, Θt ) log = log P (U, Θt )
P (J|U, Θt )
J∈J n

5
2.2 Maximizing The Bound

In order to maximize B(Θ; Θt ) with respect to Θ, note that we can write it as

∆
B(Θ; Θt ) = hlog P (U, J, Θ)i + H
= hlog P (U, J|Θ)i + log P (Θ) + H
= Qt (Θ) + log P (Θ) + H
∆
where h.i denotes the expectation with respect to f t (J) = P (J|U, Θt ), and

• Qt (Θ) is the expected complete log-likelihood, defined as:

∆
Qt (Θ) = hlog P (U, J|Θ)i

• P (Θ) is the prior on the parameters Θ

∆
• H = − hlog f t (J)i is the entropy of the distribution f t (J)

Since H does not depend on Θ, we can maximize the bound with respect to Θ using
the first two terms only:

Θt+1 = argmax B(Θ; Θt ) = argmax Qt (Θ) + log P (Θ)

(4)
Θ Θ

2.3 The EM Algorithm

At each iteration, the EM algorithm first finds an optimal lower bound B(Θ; Θ t ) at the
current guess Θt (equation 3), and then maximizes this bound to obtain an improved
estimate Θt+1 (equation 4). Because the bound is expressed as an expectation, the first
step is called the “expectation-step” or E-step, whereas the second step is called the
“maximization-step” or M-step. The EM algorithm can thus be conveniently summa-
rized as:

∆
• E-step: calculate f t (J) = P (J|U, Θt )
• M-step: Θt+1 = argmax Θ [Qt (Θ) + log P (Θ)]

It is important to remember that Qt (Θ) is calculated in the E-step by evaluating f t (J)

using the current guess Θt (hence the superscript t), whereas in the M-step we are
optimizing Qt (Θ) with respect to the free variable Θ to obtain the new estimate Θt+1 .
It can be proved that the EM algorithm converges to a local maximum of log P (U, Θ),
and thus equivalently maximizes the log-posterior log P (Θ|U) (Dempster et al., 1977;
McLachlan and Krishnan, 1997).

6
A Relation to the Expected Log-Posterior
Note that we have chosen to define Qt (Θ) as the expected log-likelihood as in (Demp-
ster et al., 1977; McLachlan and Krishnan, 1997), i.e.,
∆
Qt (Θ) = hlog P (U, J|Θ)i

An alternative route is to compute the expected log-posterior (Tanner, 1996):

hlog P (Θ|U, J)i = hlog P (U, J|Θ) + log P (Θ) − log P (U, J)i (5)

Here the second term does not depend on J and can be taken out of the expectation,
and the last term does not depend on Θ. Hence, maximizing (5) with respect to Θ is
equivalent to (4):

argmax hlog P (Θ|U, J)i = argmax [hlog P (U, J|Θ)i + log P (Θ)]
Θ Θ
= argmax Qt (Θ) + log P (Θ)

Θ

This is of course identical to (4).

References
[1] Dempster, A., Laird, N., and Rubin, D. (1977). Maximum likelihood from incom-
plete data via the EM algorithm. Journal of the Royal Statistical Society, Series B,
39(1):1–38.
[2] Hartley, H. (1958). Maximum likelihood estimation from incomplete data. Bio-
metrics, 14:174–194.
[3] McLachlan, G. and Krishnan, T. (1997). The EM algorithm and extensions. Wiley
series in probability and statistics. John Wiley & Sons.
[4] Minka, T. (1998). Expectation-Maximization as lower bound maximiza-
tion. Tutorial published on the web at http://www-white.media.mit.edu/ tp-
minka/papers/em.html.
[5] Neal, R. and Hinton, G. (1998). A view of the EM algorithm that justifies incre-
mental, sparse, and other variants. In Jordan, M., editor, Learning in Graphical
Models. Kluwer Academic Press.

[6] Tanner, M. (1996). Tools for Statistical Inference. Springer Verlag, New York.
Third Edition.

Transportation Cross Docking - EWM
No ratings yet
Transportation Cross Docking - EWM
5 pages
EM-algorithm: California Institute of Technology 136-93 Pasadena, CA 91125 Welling@vision - Caltech.edu
No ratings yet
EM-algorithm: California Institute of Technology 136-93 Pasadena, CA 91125 Welling@vision - Caltech.edu
7 pages
Tutorial On Generalized Expectation
No ratings yet
Tutorial On Generalized Expectation
6 pages
Tutorial On Generalized Expectation Maximization: Javier R. Movellan
No ratings yet
Tutorial On Generalized Expectation Maximization: Javier R. Movellan
6 pages
EM Algo
No ratings yet
EM Algo
8 pages
The EM Algorithm: Ajit Singh November 20, 2005
No ratings yet
The EM Algorithm: Ajit Singh November 20, 2005
4 pages
Lec16 PDF
No ratings yet
Lec16 PDF
10 pages
Expectation-Maximization Algorithm
No ratings yet
Expectation-Maximization Algorithm
13 pages
The Kullback-Liebler Distance and Entropy
No ratings yet
The Kullback-Liebler Distance and Entropy
5 pages
EM at RIT
No ratings yet
EM at RIT
17 pages
Algoritmo E-M. Utilizado para Calcular La Mezcla de Gausianas
No ratings yet
Algoritmo E-M. Utilizado para Calcular La Mezcla de Gausianas
8 pages
Lecture3 EM
No ratings yet
Lecture3 EM
36 pages
TR 97 021
No ratings yet
TR 97 021
15 pages
Figueiredo EM Algorithm
No ratings yet
Figueiredo EM Algorithm
35 pages
(Slides) The em Algorithm
No ratings yet
(Slides) The em Algorithm
14 pages
AI29
No ratings yet
AI29
3 pages
gmm
No ratings yet
gmm
8 pages
Mixture Models and Expectation-Maximization: Justus H. Piater
No ratings yet
Mixture Models and Expectation-Maximization: Justus H. Piater
11 pages
کتاب ششم بارگزاری شده
No ratings yet
کتاب ششم بارگزاری شده
49 pages
Expectation Maximization
No ratings yet
Expectation Maximization
19 pages
Expectation Maximization (EM) Algorithm.pptx
No ratings yet
Expectation Maximization (EM) Algorithm.pptx
47 pages
GMMEMNotes
No ratings yet
GMMEMNotes
10 pages
EM Presentation 2013
No ratings yet
EM Presentation 2013
18 pages
Expectation Maximization
No ratings yet
Expectation Maximization
21 pages
cs229 Notes7b PDF
No ratings yet
cs229 Notes7b PDF
4 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
14 pages
GAUSSIAN MIXTURES
No ratings yet
GAUSSIAN MIXTURES
5 pages
Unit 2
No ratings yet
Unit 2
7 pages
S6, S7, S8 CS - U4 Getter Setter EM Algorithm
No ratings yet
S6, S7, S8 CS - U4 Getter Setter EM Algorithm
32 pages
Isye 6416: Computational Statistics Spring 2023: Prof. Yao Xie
No ratings yet
Isye 6416: Computational Statistics Spring 2023: Prof. Yao Xie
24 pages
05 Vae
No ratings yet
05 Vae
76 pages
Dis10 Sol PDF
No ratings yet
Dis10 Sol PDF
6 pages
Latent 2
No ratings yet
Latent 2
4 pages
The Expectation-Maximisation Algorithm: 14.1 The EM Algorithm - A Method For Maximising The Likeli-Hood
No ratings yet
The Expectation-Maximisation Algorithm: 14.1 The EM Algorithm - A Method For Maximising The Likeli-Hood
21 pages
An Alternative View of EM_poornima
No ratings yet
An Alternative View of EM_poornima
4 pages
Chapter 9.4 Allele Frequency Estimation
No ratings yet
Chapter 9.4 Allele Frequency Estimation
24 pages
Beamer
No ratings yet
Beamer
34 pages
EM Algorithm: Shu-Ching Chang Hyung Jin Kim December 9, 2007
No ratings yet
EM Algorithm: Shu-Ching Chang Hyung Jin Kim December 9, 2007
10 pages
Oral Texte
No ratings yet
Oral Texte
12 pages
ds11 2
No ratings yet
ds11 2
19 pages
11 Hidden Markov Models (HMMS) Model and Problem Description
No ratings yet
11 Hidden Markov Models (HMMS) Model and Problem Description
15 pages
GMM Said Crv10 Tutorial
No ratings yet
GMM Said Crv10 Tutorial
27 pages
Unit 3 ML
No ratings yet
Unit 3 ML
45 pages
The Expectation-Maximization Algorithm: IEEE Signal Processing Magazine December 1996
No ratings yet
The Expectation-Maximization Algorithm: IEEE Signal Processing Magazine December 1996
15 pages
Gaussian Mixture Model: P (X - Y) P (Y - X) P (X)
No ratings yet
Gaussian Mixture Model: P (X - Y) P (Y - X) P (X)
3 pages
Aiml Lab Algorithms
No ratings yet
Aiml Lab Algorithms
10 pages
Some Studies of Expectation Maximization Clustering Algorithm To Enhance Performance
No ratings yet
Some Studies of Expectation Maximization Clustering Algorithm To Enhance Performance
16 pages
Chap2 Part2 GMM
No ratings yet
Chap2 Part2 GMM
34 pages
Expectation Maximization Notes
No ratings yet
Expectation Maximization Notes
5 pages
Lec15 16 Handout
No ratings yet
Lec15 16 Handout
33 pages
EM Algorithm - Praveen
No ratings yet
EM Algorithm - Praveen
3 pages
HMM Tutorial
No ratings yet
HMM Tutorial
15 pages
ML - Expectation-Maximization Algorithm
No ratings yet
ML - Expectation-Maximization Algorithm
3 pages
CB PDF
No ratings yet
CB PDF
69 pages
UNIT 4 - EM Alg
No ratings yet
UNIT 4 - EM Alg
3 pages
Module13 GaussianMixtureModel
No ratings yet
Module13 GaussianMixtureModel
17 pages
Halloween Holiday Collection, Grade 2
From Everand
Halloween Holiday Collection, Grade 2
Carson Dellosa Education
No ratings yet
Worked Examples in Advanced Mechanics of Materials using MATLAB
From Everand
Worked Examples in Advanced Mechanics of Materials using MATLAB
Eric Okoth Ogur
No ratings yet
Fractions - Advanced
From Everand
Fractions - Advanced
Ruth Herlihy
No ratings yet
Cut, Color, Trace, & Paste, Ages 4 - 7
From Everand
Cut, Color, Trace, & Paste, Ages 4 - 7
Sherrill B. Flora
No ratings yet
One Tree
From Everand
One Tree
Marie Tabler
No ratings yet
RSB Parameter Limits
No ratings yet
RSB Parameter Limits
10 pages
B. Exegetical Bible Study Methods F. Schaeffer
100% (2)
B. Exegetical Bible Study Methods F. Schaeffer
26 pages
Walter Rudin
No ratings yet
Walter Rudin
4 pages
PCG-GR250 GR250K GR250P GR390P GR214MP
No ratings yet
PCG-GR250 GR250K GR250P GR390P GR214MP
32 pages
Unit-I &ii-Tlw-2022
No ratings yet
Unit-I &ii-Tlw-2022
60 pages
QB On Application 2
No ratings yet
QB On Application 2
9 pages
08 Fugacity of Species in A Solution PDF
No ratings yet
08 Fugacity of Species in A Solution PDF
41 pages
UEAnal. Ch-4
No ratings yet
UEAnal. Ch-4
12 pages
2017TJS53
No ratings yet
2017TJS53
8 pages
Production of Granola Breakfast Cereal B
No ratings yet
Production of Granola Breakfast Cereal B
6 pages
Al15 Kgdraft en Rev00 ZSP Datasheet Web
No ratings yet
Al15 Kgdraft en Rev00 ZSP Datasheet Web
2 pages
COE201 Lab 1
No ratings yet
COE201 Lab 1
48 pages
Directional Earth Fault MRP2
No ratings yet
Directional Earth Fault MRP2
36 pages
[FREE PDF sample] CURRENT Diagnosis Treatment Gastroenterology Hepatology Endoscopy 1st Edition Norton Greenberger ebooks
No ratings yet
[FREE PDF sample] CURRENT Diagnosis Treatment Gastroenterology Hepatology Endoscopy 1st Edition Norton Greenberger ebooks
67 pages
Homework 1, Solutions PDF
No ratings yet
Homework 1, Solutions PDF
2 pages
Mtcars Dataset Analysis in R
No ratings yet
Mtcars Dataset Analysis in R
4 pages
Chapter 3_P1_MSI Logic Circuit (Decoder-Encoder)
No ratings yet
Chapter 3_P1_MSI Logic Circuit (Decoder-Encoder)
67 pages
3ds Max (Glass)
No ratings yet
3ds Max (Glass)
12 pages
Management of The Soft Palate Defect Steven Eckert PDF
No ratings yet
Management of The Soft Palate Defect Steven Eckert PDF
15 pages
SUMMATIVE TEST - 1st QTR - SY2020-2021 - No Answers
No ratings yet
SUMMATIVE TEST - 1st QTR - SY2020-2021 - No Answers
3 pages
TDS 11594210 SyncorePlus
No ratings yet
TDS 11594210 SyncorePlus
17 pages
Machine Vice Tutorial
No ratings yet
Machine Vice Tutorial
25 pages
Migrating AS400-COBOL To Java A Report From The Field: Harry - Sneed@t-Online - de Katalin - Erdos1@t-Online - Hu
No ratings yet
Migrating AS400-COBOL To Java A Report From The Field: Harry - Sneed@t-Online - de Katalin - Erdos1@t-Online - Hu
10 pages
10 Science Imp ch10 1
No ratings yet
10 Science Imp ch10 1
10 pages
Manual of The Pioneer LX
No ratings yet
Manual of The Pioneer LX
145 pages
NF EN 14566+A1 - Mechanical fasteners for gypsum plasterboard systems
No ratings yet
NF EN 14566+A1 - Mechanical fasteners for gypsum plasterboard systems
34 pages
Six Axis Articlated Robotic Arm 2nd Presentation
No ratings yet
Six Axis Articlated Robotic Arm 2nd Presentation
17 pages
Efecto Flash-Lag
No ratings yet
Efecto Flash-Lag
6 pages
How To Plan PCB Projects From Design To Examples
100% (1)
How To Plan PCB Projects From Design To Examples
16 pages