Pattern Recognition Presenation
Pattern Recognition Presenation
Pattern Recognition Presenation
Classification
• Machine Perception
• An Example
• Pattern Recognition Systems
• The Design Cycle
• Learning and Adaptation
• Conclusion
3
Machine Perception
An Example
Sea bass
Species
Salmon
• Problem Analysis
• Set up a camera and take some sample images to extract
features
• Length
• Lightness
• Width
• Number and shape of fins
• Position of the mouth, etc…
• This is the set of all suggested features to explore for use in our
classifier!
• Preprocessing
• Classification
• Select the length of the fish as a possible feature for
discrimination
Lightness Width
Issue of generalization!
• Sensing
• Use of a transducer (camera or microphone)
• PR system depends of the bandwidth, the resolution
sensitivity distortion of the transducer
• Feature extraction
• Discriminative features
• Invariant features with respect to translation, rotation and
scale.
• Classification
• Use a feature vector provided by a feature extractor to
assign the object to a category
• Post Processing
• Exploit context input dependent information other than from
the target pattern itself to improve performance
• Data collection
• Feature Choice
• Model Choice
• Training
• Evaluation
• Computational Complexity
• Data Collection
• How do we know when we have collected an adequately
large and representative set of examples for training and
testing the system?
• Feature Choice
• Depends on the characteristics of the problem domain.
Simple to extract, invariant to irrelevant transformation
insensitive to noise.
• Model Choice
• Unsatisfied with the performance of our fish classifier and
want to jump to another class of model
• Training
• Use data to determine the classifier. Many different
procedures for training classifiers and choosing models
• Evaluation
• Measure the error rate (or performance and switch from
one set of features to another one
• Computational Complexity
• What is the trade-off between computational ease and
performance?
• Supervised learning
• A teacher provides a category label or cost for each
pattern in the training set
• Unsupervised learning
• The system forms clusters or “natural groupings” of the
input patterns
Conclusion
• Introduction
• Bayesian Decision Theory–Continuous Features
2
Introduction
• The sea bass/salmon example
• State of nature, prior
• State of nature is a random variable
• The catch of salmon and sea bass is equiprobable
• P(ω1) = P(ω2) (uniform priors)
Therefore:
whenever we observe a particular x, the probability of
error is :
P(error | x) = P(ω1 | x) if we decide ω2
P(error | x) = P(ω2 | x) if we decide ω1
Pattern Classification, Chapter 2 (Part 1)
8
Therefore:
P(error | x) = min [P(ω1 | x), P(ω2 | x)]
(Bayes decision)
Conditional risk
j =c
R( α i | x ) = ∑ λ ( α i | ω j ) P ( ω j | x )
j =1
for i = 1,…,a
Pattern Classification, Chapter 2 (Part 1)
13
• Two-category classification
α1 : deciding ω1
α2 : deciding ω2
λij = λ(αi | ωj)
loss incurred for deciding ωi when the true state of nature is ωj
Conditional risk:
Likelihood ratio:
P ( x | ω 1 ) λ12 − λ 22 P ( ω 2 )
if > .
P ( x | ω 2 ) λ 21 − λ11 P ( ω 1 )
P(ω1) = 2/3
P(ω2) = 1/3
⎡1 2⎤
λ=⎢ ⎥
⎣3 4 ⎦
Pattern Classification, Chapter 2 (Part 1)
0
Pattern
Classification
• Minimum-Error-Rate Classification
• Classifiers, Discriminant Functions and Decision Surfaces
• The Normal Density
2
Minimum-Error-Rate Classification
= ∑ P( ω j | x ) = 1 − P( ω i | x )
j ≠1
λ12 − λ 22 P ( ω 2 ) P( x | ω1 )
Let . = θ λ then decide ω 1 if : > θλ
λ 21 − λ11 P ( ω 1 ) P( x | ω 2 )
• If λ is the zero-one loss function wich means:
⎛ 0 1⎞
λ = ⎜⎜ ⎟⎟
⎝1 0⎠
P( ω 2 )
then θ λ = = θa
P( ω1 )
⎛0 2 ⎞ 2 P( ω 2 )
if λ = ⎜⎜ ⎟⎟ then θ λ = = θb
⎝1 0⎠ P( ω1 )
Pattern Classification, Chapter 2 (Part 2)
6
g( x ) = P ( ω 1 | x ) − P ( ω 2 | x )
P( x | ω1 ) P( ω1 )
= ln + ln
P( x | ω 2 ) P( ω 2 )
1 ⎡ 1⎛ x−μ⎞ ⎤
2
P( x ) = exp ⎢ − ⎜ ⎟ ⎥,
2π σ ⎢⎣ 2 ⎝ σ ⎠ ⎥⎦
Where:
μ = mean (or expected value) of x
σ2 = expected squared deviation or variance
• Multivariate density
• Multivariate normal density in d dimensions is:
1 ⎡ 1 ⎤
P( x ) = exp ⎢ − ( x − μ ) Σ ( x − μ )⎥
t −1
( 2π ) Σ ⎣ 2 ⎦
d/2 1/ 2
where:
x = (x1, x2, …, xd)t (t stands for the transpose vector form)
μ = (μ1, μ2, …, μd)t mean vector
Σ = d*d covariance matrix
|Σ| and Σ-1 are determinant and inverse respectively
σ 2σ 2
gi(x) = gj(x)
1
if P ( ω i ) = P ( ω j ) then x0 = ( μ i + μ j )
2
1
x0 = ( μ i + μ j ) −
[ ]
ln P ( ω i ) / P ( ω j )
.( μ i − μ j )
2 ( μi − μ j ) Σ ( μi − μ j )
t −1
• Case Σi = arbitrary
• The covariance matrices are different for each category
g i ( x ) = x tW i x + w it x = w i 0
where :
1 −1
Wi = − Σ i
2
w i = Σ i− 1 μ i
1 t −1 1
w i0 = − μ i Σ i μ i − ln Σ i + ln P ( ω i )
2 2
where :
pi ( 1 − q i )
w i = ln i = 1 ,..., d
q i ( 1 − pi )
and :
1 − pi
d
P( ω1 )
w0 = ∑ ln + ln
i =1 1 − qi P( ω 2 )
decide ω 1 if g(x) > 0 and ω 2 if g(x) ≤ 0
Pattern Classification, Chapter 2 (Part 3)