Linear Discriminant Analysis: Intelligent Data Analysis and Probabilistic Inference
Linear Discriminant Analysis: Intelligent Data Analysis and Probabilistic Inference
Linear Discriminant Analysis: Intelligent Data Analysis and Probabilistic Inference
Lecture 15:
Linear Discriminant Analysis
Recommended reading: Bishop,
Chapter 4.1
Hastie et al., Chapter 4.3
x1
Adapted from PRML (Bishop, 2006)
0.5
−0.5
−1
−1 −0.5 0 0.5 1
Orthogonal Projections
(Repetition)
Classification as Projection
w0
Classification as Projection
w0
§ N
Assume Gaussian likelihood ppx|Ciq “ `x|mi, Σ˘ with the same
ppC1|xq
log “ 0 ppC2|
xq
§ Look at the log-probability ratio
§ N
Assume Gaussian likelihood ppx|Ciq “ `x|mi, Σ˘ with the same
ppC1|xq
log “ 0 ppC2|
xq
§ N
Assume Gaussian likelihood ppx|Ciq “ `x|mi, Σ˘ with the same
ppC1|xq
log “ 0 ppC2|
xq
pp C1 |
xq ppx|C1q ppC1q log “ log ` log
ppC2|xq ppx|C2q ppC2q
where the decision boundary (for C1 or C2) is at 0.
§ N
Assume Gaussian likelihood ppx|Ciq “ `x|mi, Σ˘ with the same
ppC1|xq
log “0
ppC2|xq
C
pp 1q 1 J´
ô log ppC2q´2pm1 Σ
1m1 ´ m2Σ´1m2q` pm1 ´ m2qJΣ´1x “ 0
1
ô pm1 ´ m2qJΣ´1x “ pmJ1
2
Σ´1m1 ´ m2Σ´1m2q´ log ppppCC21qq
1
m1 “ ÿ xn ,
N1 nPC1 N2 nPC2
1
m2 “ ÿ xn
1 1
m1 “ ÿ xn , m2 “ ÿ xn
N1 PC1 n N2 C2 nP
§ Measure class separation as the distance of the projected class
means:
10
nPCk
nPCk
1 SW “ ÿk
ÿnPCkpxn ´
mkqpxn ´ mkqJ
nPCk
Objective
Find w ˚ that maximizes
wJ SBw
J pw q“
wJ SW w
Objective
Find w ˚ that maximizes
wJ SBw
J pw q“
wJ SW w
We find w by setting dJ{dw “ 0:
Objective
Find w ˚ that maximizes
wJ SBw
J pw q“
wJ SW w
We find w by setting dJ{dw “ 0:
ô SBw ´ JSWw “ 0
Objective
Find w ˚ that maximizes
wJ SBw
J pw q“
wJ SW w
14
“ 0 ô SW´1SBw ´ Jw “ 0
Objective
Find w ˚ that maximizes
wJ SBw
J pw q“
wJ SW w
14
“ 0 ô SW´1SBw ´ Jw “ 0
14
Objective
Find w ˚ that maximizes
wJ SBw
J pw q“
wJ SW w
dJ{dw “ 0 ô `wJSWw˘SBw ´ `wJSBw˘SWw “ 0 ô SBw ´ JSWw
“ 0 ô SW´1SBw ´ Jw “ 0
Objective
Find w ˚ that maximizes
wJ SBw
J pw q“
wJ SW w
14
“ 0 ô SW´1SBw ´ Jw “ 0
1. Mean normalization
1. Mean normalization
1. Mean normalization
2. Compute mean vectors mi P RD for all k classes
1. Mean normalization
2. Compute mean vectors mi P RD for all k classes
6. Project samples onto the new subspace using W and compute the
new coordinates as Y “ XW
§ LDA’s most disriminant features are the means of the data distributions
§ LDA will fail when the discriminatory information is not the mean but
the variance of the data.
§ If the data distributions are very non-Gaussian, the LDA projections will
not preserve the complex structure of the data that may be required
for classification
References I