0% found this document useful (0 votes)

34 views

Objectives:: Expectation Maximization (Em)

This lecture discusses the Expectation Maximization (EM) algorithm. EM is an iterative method for finding maximum likelihood estimates of parameters in probabilistic models. It involves alternating between estimating hidden variables (E-step) and maximizing the expected log-likelihood (M-step). The algorithm is proved to converge using Jensen's inequality and the EM theorem. An example of using EM to estimate missing data is provided.

Uploaded by

Krish Cs20

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views

Objectives:: Expectation Maximization (Em)

Uploaded by

Krish Cs20

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 17

8443 – Introduction

ECE 8527 to Machine Learning and Pattern Recognition

Pattern Recognition

LECTURE 10: EXPECTATION MAXIMIZATION (EM)

• Objectives:
Jensen’s Inequality (Special Case)
EM Theorem Proof
EM Example – Missing Data
Application: Hidden Markov Models

• Resources:
Wiki: EM History
T.D.: Brown CS Tutorial
UIUC: Tutorial
F.J.: Statistical Methods
The Expectation Maximization Algorithm (Preview)

ECE 8527: Lecture 10, Slide 1

The Expectation Maximization Algorithm (Cont.)

ECE 8527: Lecture 10, Slide 2

The Expectation Maximization Algorithm

ECE 8527: Lecture 10, Slide 3

Synopsis
• Expectation maximization (EM) is an approach that is used in many ways to
find maximum likelihood estimates of parameters in probabilistic models.

• EM is an iterative optimization method to estimate some unknown parameters

given measurement data. Used in a variety of contexts to estimate missing
data or discover hidden variables.

• The intuition behind EM is an old one: alternate between estimating the

unknowns and the hidden variables. This idea has been around for a long
time. However, in 1977, Dempster, et al., proved convergence and explained
the relationship to maximum likelihood estimation.

• EM alternates between performing an expectation (E) step, which computes

an expectation of the likelihood by including the latent variables as if they
were observed, and a maximization (M) step, which computes the maximum
likelihood estimates of the parameters by maximizing the expected likelihood
found on the E step. The parameters found on the M step are then used to
begin another E step, and the process is repeated.

• This approach is the cornerstone of important algorithms such as hidden

Markov modeling and discriminative training, and has been applied to fields
including human language technology and image processing.
ECE 8527: Lecture 10, Slide 4
Special Case of Jensen’s Inequality
Lemma: If p(x) and q(x) are two discrete probability distributions, then:
 p( x) log p( x)   p( x) log q( x)
x x
with equality if and only if p(x) = q(x) for all x.

Proof:
 p( x) log p( x)   p( x) log q( x)  0
x x

 p( x) log  p( x)  q( x)  0
x

 p ( x) 
x p ( x ) log    0
 q ( x) 
q( x)
x p ( x ) log
p( x)
0

q( x) q( x)
x p ( x ) log
p( x)
 x p ( x )(
p ( x)
 1)

The last step follows using a bound for the natural logarithm: ln x  x  1.

ECE 8527: Lecture 10, Slide 5

Special Case of Jensen’s Inequality
Continuing in efforts to simplify:
q( x) q ( x)  q( x) 
 p( x) log   p( x)(  1)  p( x)    p( x)   q( x)   p( x)  0..
x p( x) x p ( x) x  p( x)  x x x

We note that since both of these functions are probability distributions, they
must sum to 1.0. Therefore, the inequality holds.
The general form of Jensen’s inequality relates a convex function of an integral
to the integral of the convex function and is used extensively in information
theory.

ECE 8527: Lecture 10, Slide 6

The EM Theorem
Theorem: If  P  t y log P t y    P  t y log P  t y  then P  y   P   y  .
t t

Proof: Let y denote observable data. Let P   y  be the probability distribution

of y under some model whose parameters are denoted by  .
Let P  y  be the corresponding distribution under a different setting  .
Our goal is to prove that y is more likely under  than   .
Let t denote some hidden, or latent, parameters that are governed by the
values of  . Because P  t y  is a probability distribution that sums to 1, we
can write:
log P  y   log P   y    P  t y log P  y   P  t y log P   y 
t t
Because we can exploit the dependence of y on t and using well-known
properties of a conditional probability distribution.

ECE 8527: Lecture 10, Slide 7

Proof Of The EM Theorem
We can multiply each term by “1”:
 P t , y    P t , y  
log P  y   log P   y    P  t y log  P  y      P  t y log  P   y    
t  P t , y   t  P t , y  
 P t , y    P t , y  
  P  t y log      P  t y log    
 P t y   t  P  t y  
  
t
  P  t y log P t , y    P  t y log P  t , y 
t t
  P  t y log P  t y    P  t y log P t , y 
t t
  P  t y log P t , y    P  t y log P  t , y 
t t

where the inequality follows from our lemma.

Explanation: What exactly have we shown? If the last quantity is greater than
zero, then the new model will be better than the old model. This suggests a
strategy for finding the new parameters, θ: choose them to make the last
quantity positive!

ECE 8527: Lecture 10, Slide 8

Discussion
• If we start with the parameter setting  , and find a parameter setting  for
which our inequality holds, then the observed data, y, will be more probable
under  than  .

• The name Expectation Maximization comes about because we take the

expectation of P t , y  with respect to the old distribution P  t , y  and then
maximize the expectation as a function of the argument  .

• Critical to the success of the algorithm is the choice of the proper

intermediate variable, t, that will allow finding the maximum of the expectation
of  P  t y log P t y  .
t

• Perhaps the most prominent use of the EM algorithm in pattern recognition is

to derive the Baum-Welch reestimation equations for a hidden Markov model.

• Many other reestimation algorithms have been derived using this approach.
ECE 8527: Lecture 10, Slide 9
Example: Estimating Missing Data

ECE 8527: Lecture 10, Slide 10

Example: Estimating Missing Data

ECE 8527: Lecture 10, Slide 11

Example: Estimating Missing Data

ECE 8527: Lecture 10, Slide 12

Example: Estimating Missing Data

ECE 8527: Lecture 10, Slide 13

Example: Estimating Missing Data

ECE 8527: Lecture 10, Slide 14

Example: Gaussian Mixtures
• An excellent tutorial on Gaussian mixture estimation can be found at
J. Bilmes, EM Estimation

• An interactive demo showing convergence of the estimate can be found at

I. Dinov, Demonstration

ECE 8527: Lecture 10, Slide 15

Summary
• Expectation Maximization (EM) Algorithm: a generalization of Maximum
Likelihood Estimation (MLE) based on maximization of a posterior that data
was generated by a model. EM is a special case of Jensen’s inequality.
• Jensen’s Inequality: describes a relationship between two probability
distributions in terms of an entropy-like quantity. A key tool in proving that EM
estimation converges.
• The EM Theorem: proved that estimation of a model’s parameters using an
iteration of EM increases the posterior probability that the data was generated
by the model.
• Demonstrated an application of the EM Theorem to the problem of estimating
missing data point.
• Explained how EM can be used to reestimate parameters in a pattern
recognition system.

ECE 8527: Lecture 10, Slide 16

ASHRAE Guideline 36-2018 PDF
100% (1)
ASHRAE Guideline 36-2018 PDF
103 pages
Lecture 01 - Logic of Propositions and Predicates (Schuller's Geometric Anatomy of Theoretical Physics)
100% (4)
Lecture 01 - Logic of Propositions and Predicates (Schuller's Geometric Anatomy of Theoretical Physics)
6 pages
Technical Design Document
No ratings yet
Technical Design Document
21 pages
Game Theory: Logic, Set and Summation Notation: Branislav L. Slantchev
No ratings yet
Game Theory: Logic, Set and Summation Notation: Branislav L. Slantchev
7 pages
Power Tips For Toyota Avanza Xenia Users
100% (1)
Power Tips For Toyota Avanza Xenia Users
7 pages
Expectation Maximization: Dekang Lin Department of Computing Science University of Alberta
No ratings yet
Expectation Maximization: Dekang Lin Department of Computing Science University of Alberta
22 pages
Chapter 1 Logic
No ratings yet
Chapter 1 Logic
42 pages
Lecture 2
No ratings yet
Lecture 2
63 pages
Exams Booklet
No ratings yet
Exams Booklet
165 pages
Entropy 17 04215
No ratings yet
Entropy 17 04215
40 pages
12 Unit7
No ratings yet
12 Unit7
26 pages
Global optimization for sum of linear ratios problem
No ratings yet
Global optimization for sum of linear ratios problem
11 pages
MaxEnt
No ratings yet
MaxEnt
6 pages
Classical Propositional Logic (Quick Review) : Episode 4
No ratings yet
Classical Propositional Logic (Quick Review) : Episode 4
14 pages
Discrete Mathematics
No ratings yet
Discrete Mathematics
41 pages
Tutorial Session 356
No ratings yet
Tutorial Session 356
15 pages
lecture_1
No ratings yet
lecture_1
23 pages
2017-44 FP
No ratings yet
2017-44 FP
14 pages
DISCRETE MATHS - Lecture 2
No ratings yet
DISCRETE MATHS - Lecture 2
107 pages
2019 Hasanen A. Hammad 1 and Manuel de La Sen Generalized Contractive Mappings and Related
No ratings yet
2019 Hasanen A. Hammad 1 and Manuel de La Sen Generalized Contractive Mappings and Related
19 pages
Let's Get Started With..
No ratings yet
Let's Get Started With..
43 pages
AI UNIT - 3
No ratings yet
AI UNIT - 3
91 pages
Hands Out3
No ratings yet
Hands Out3
30 pages
Propositional Equivalences: Presenter Yukun Wang Computer Science and Technology
No ratings yet
Propositional Equivalences: Presenter Yukun Wang Computer Science and Technology
54 pages
Exploration in Contextual Bandits: Reedy Reedy
No ratings yet
Exploration in Contextual Bandits: Reedy Reedy
16 pages
Unit 3 notes logic concepts
No ratings yet
Unit 3 notes logic concepts
36 pages
Ai - Unit-3
No ratings yet
Ai - Unit-3
16 pages
Logical Equivalence Proof
No ratings yet
Logical Equivalence Proof
29 pages
homework1 2011_ beypazzar
No ratings yet
homework1 2011_ beypazzar
6 pages
Lab2
No ratings yet
Lab2
4 pages
Logic and Truth Tables
No ratings yet
Logic and Truth Tables
42 pages
Lecture Notes 03, Laws of Logic and Rules of Inference
No ratings yet
Lecture Notes 03, Laws of Logic and Rules of Inference
79 pages
1982 When the Data Are Functions
No ratings yet
1982 When the Data Are Functions
18 pages
Discrete Mathematics: (COMP2121B)
No ratings yet
Discrete Mathematics: (COMP2121B)
35 pages
An3 Cor
No ratings yet
An3 Cor
5 pages
Growth and Decay
No ratings yet
Growth and Decay
26 pages
Discrete Mathematics and Graph Theory
No ratings yet
Discrete Mathematics and Graph Theory
64 pages
Chapter 8 Analytical Statistical Assessment - Data Analysis and Visualization in R (IN2339)
No ratings yet
Chapter 8 Analytical Statistical Assessment - Data Analysis and Visualization in R (IN2339)
43 pages
Note 3 PDF
No ratings yet
Note 3 PDF
7 pages
Applied Math
No ratings yet
Applied Math
23 pages
Unit 1 Sec 2.pptx
No ratings yet
Unit 1 Sec 2.pptx
27 pages
#1 Logic
No ratings yet
#1 Logic
62 pages
CSE II I Mathematical Foundations of Computer Science
No ratings yet
CSE II I Mathematical Foundations of Computer Science
91 pages
I2ml3e Chap7
No ratings yet
I2ml3e Chap7
22 pages
Lesson 8
No ratings yet
Lesson 8
8 pages
Exam1 - Sample1.pdf Descrete Maths
No ratings yet
Exam1 - Sample1.pdf Descrete Maths
5 pages
MA400NI WK02 L Propositional Logic (Continued)
No ratings yet
MA400NI WK02 L Propositional Logic (Continued)
32 pages
Extrem Al
No ratings yet
Extrem Al
18 pages
Elements of Information Theory-Chapter1-2
No ratings yet
Elements of Information Theory-Chapter1-2
63 pages
Time Space 1
No ratings yet
Time Space 1
2 pages
A Quantum Version of Randomization Crite
No ratings yet
A Quantum Version of Randomization Crite
28 pages
Cmfe Termproj
No ratings yet
Cmfe Termproj
7 pages
L33
No ratings yet
L33
48 pages
Boolean Book
No ratings yet
Boolean Book
107 pages
Chap 2
No ratings yet
Chap 2
25 pages
How To Say "No": April 1989
No ratings yet
How To Say "No": April 1989
7 pages
How To Say "No": April 1989
No ratings yet
How To Say "No": April 1989
7 pages
Discrete Notes
No ratings yet
Discrete Notes
46 pages
01 Logic
No ratings yet
01 Logic
84 pages
CSC 203 Discrete Structure Lesson Note
No ratings yet
CSC 203 Discrete Structure Lesson Note
36 pages
CH 4
No ratings yet
CH 4
6 pages
Lectures on the Coupling Method
From Everand
Lectures on the Coupling Method
Torgny Lindvall
No ratings yet
Lectures on Integral Equations
From Everand
Lectures on Integral Equations
Harold Widom
3.5/5 (1)
Water Resources Assessment: Dominique Senn, Seecon GMBH
No ratings yet
Water Resources Assessment: Dominique Senn, Seecon GMBH
19 pages
Resume Hakkallah
No ratings yet
Resume Hakkallah
2 pages
An Introduction To Error-Correcting Codes: The Virtues of Redundancy
No ratings yet
An Introduction To Error-Correcting Codes: The Virtues of Redundancy
38 pages
White Light Cavity To Aid Gravitational Wave Detection
No ratings yet
White Light Cavity To Aid Gravitational Wave Detection
17 pages
Digital Image Processing
No ratings yet
Digital Image Processing
22 pages
Logic and Computing Devices Lab
No ratings yet
Logic and Computing Devices Lab
3 pages
CIT Showcase Walters Brooke 2010 04 Part 2
No ratings yet
CIT Showcase Walters Brooke 2010 04 Part 2
25 pages
Web Application Model Web Development
100% (1)
Web Application Model Web Development
43 pages
222 Ways To Avoid Very
100% (1)
222 Ways To Avoid Very
21 pages
Motor Feeder Cable & Cable Tray Sizing and Data
No ratings yet
Motor Feeder Cable & Cable Tray Sizing and Data
5 pages
Penjelasan Project SIK - 2022-2023 Gasal
No ratings yet
Penjelasan Project SIK - 2022-2023 Gasal
5 pages
Bennett Enterprises Limited Profile
No ratings yet
Bennett Enterprises Limited Profile
15 pages
Receipt - 1 - 11 - 2024 12 - 00 - 00 AM
No ratings yet
Receipt - 1 - 11 - 2024 12 - 00 - 00 AM
1 page
Accuracy or Fidelity:: X DF DX
No ratings yet
Accuracy or Fidelity:: X DF DX
2 pages
Download Scientific and Technical Translation Routledge Translation Guides First Edition Maeve Olohan ebook All Chapters PDF
100% (3)
Download Scientific and Technical Translation Routledge Translation Guides First Edition Maeve Olohan ebook All Chapters PDF
40 pages
Catalogo ISC
No ratings yet
Catalogo ISC
6 pages
Instruction Sheet: VIQUA UV Lamp and Quartz Sleeve Replacement
No ratings yet
Instruction Sheet: VIQUA UV Lamp and Quartz Sleeve Replacement
5 pages
Session 2 Understanding The Role of The PPST in RPMS
No ratings yet
Session 2 Understanding The Role of The PPST in RPMS
22 pages
Manual DS-416
No ratings yet
Manual DS-416
16 pages
Super 50 Parts 450302
No ratings yet
Super 50 Parts 450302
47 pages
Smallville S01e01ma
No ratings yet
Smallville S01e01ma
31 pages
Group-6 a338 Accresm
No ratings yet
Group-6 a338 Accresm
9 pages
Journal of Anxiety Disorders: Laura H. Clark, Jennifer L. Hudson, Ronald M. Rapee, Katrina L. Grasby
No ratings yet
Journal of Anxiety Disorders: Laura H. Clark, Jennifer L. Hudson, Ronald M. Rapee, Katrina L. Grasby
8 pages
Gerund Phrase 2
No ratings yet
Gerund Phrase 2
5 pages
Chapter-1 - Introduction
No ratings yet
Chapter-1 - Introduction
4 pages
Module 1
No ratings yet
Module 1
26 pages
21ST Century
No ratings yet
21ST Century
13 pages
1 Getting Down To Earth
No ratings yet
1 Getting Down To Earth
40 pages
Bee Lok
No ratings yet
Bee Lok
1 page
PINN Gentle Introduction
No ratings yet
PINN Gentle Introduction
26 pages
CRM2010 Eng PDF
No ratings yet
CRM2010 Eng PDF
2 pages
IOM Presnetation GCT, Mianwali 30-01-2019 2
100% (2)
IOM Presnetation GCT, Mianwali 30-01-2019 2
42 pages
A0056006120-Audi A6 Suspension Service Manual - Running Gear Front-Wheel Drive and Four-Wheel Drive
100% (1)
A0056006120-Audi A6 Suspension Service Manual - Running Gear Front-Wheel Drive and Four-Wheel Drive
468 pages
Phy 15 Notes
No ratings yet
Phy 15 Notes
8 pages
New Microsoft Word Document
No ratings yet
New Microsoft Word Document
2 pages
Excel 2007 Tutorial
No ratings yet
Excel 2007 Tutorial
5 pages