Introduction To Pattern Recognition
Introduction To Pattern Recognition
Recognition
1
Reference book:
• “Pattern Classification” by Richard O. Duda,
Peter E. Hart and David G. Stork
2
Some Pattern Recognition tasks performed
by humans in day to day life
• Recognize a face
• Understand spoken words
• Read handwritten characters
• Decide if an apple is ripe based on its smell
• Identify presence of car keys in our pocket by feel
3
What is Machine Learning
Definition:
– Machine Learning is a field. that strives to incorporate the Learning Ability into
machines. [Machine Learning, Tom Mitchell]
– Machine Learning is a field that is concerned with building machines that can learn and
improve automatically with experience.
Comment :
“We do not know yet how to make machines learn as well as humans do, but algorithms
have been invented that are effective for certain types of machine-learning tasks.”
Some Pattern Recognition tasks we
would like machines to perform
• Speech Recognition
• Fingerprint Recognition
• Face Recognition
• Optical Character Recognition
• Object Identification/Classification
• DNA sequence identification
Category “A”
Category “B”
6
Classification vs Clustering
Category “B”
7
Classification vs Clustering
8
What is a Pattern?
• An object or event. x1
x
• Represented by a vector x of values 2
corresponding to various features. x .
.
xn
9
What is a Pattern? (con’t)
• Loan/Credit card applications
– Income, # of dependents, mortgage amount credit
worthiness classification
• Dating services
– Age, hobbies, income “desirability” classification
• Web documents
– Key-word based descriptions (e.g., documents
containing “football”, “NFL”) document classification
10
What is a Class ?
• A collection of “similar” objects.
11
Main Objectives
(1) Separate data belonging to different classes.
Gender Classification
12
Main Approaches
x: input vector (pattern)
ω1
ω: class label (class)
• Generative ω2
– Models the joint probability, p(x, ω).
– Makes predictions by using Bayes rule to calculate P(ω/x).
– Pick the most likely class label ω.
• Discriminative
– Does not model p(x, ω).
– Estimates P(ω/x) by “learning” a direct mapping from x to ω (i.e.,
estimate decision boundary).
– Pick the most likely class label ω.
13
How do we model p(x, ω)?
• Typically, using a statistical model.
– probability density function (e.g., Gaussian)
male
Gender Classification female
14
Data Variability
• Intra-class variability
• Inter-class variability
16
Handwriting Recognition
17
License Plate Recognition
18
Biometric Recognition
19
Face Detection/Recognition
Detection
Matching
Recognition
20
Fingerprint Classification
Important step for speeding up identification
21
Autonomous Systems
Obstacle detection and avoidance
Object recognition
22
Medical Applications
Skin Cancer Detection Breast Cancer Detection
23
Land Cover Classification
(using aerial or satellite images)
24
More Applications
• Recommendation systems
– e.g., Amazon, Netflix
• Email spam filters
• Malicious website detection
• Loan/Credit Card Applications
25
Main Phases in Pattern Recognition
Testing Training
26
Complexity of PR – An Example
camera
Problem: Sorting
incoming fish on a
conveyor belt.
Assumption: Two
kind of fish:
(1) sea bass
(2) salmon
27
Sensors
• Sensing:
– Use some kind of a
sensor (e.g., camera,
weight scale) for data
capture.
– PR’s overall performance
depends on bandwidth,
resolution, sensitivity,
distortion of the sensor
being used.
28
Preprocessing
A critical step for reliable feature extraction!
Examples:
• Noise removal
• Image enhancement
• Separate touching
or occluding fish
• Extract boundary of
each fish
29
Training/Test data
• How do we know that we have collected an
adequately large and representative set of
examples for training/testing the system?
Training Set ?
Test Set ?
30
Feature Extraction
• How to choose a good set of features?
– Discriminative features
threshold l*
threshold x*
x1 x1 : lightness
x x2 : width
2
35
Multiple Features (cont’d)
• Does adding more features always help?
– It might be difficult and computationally expensive to
extract more features.
– Correlated features might not improve performance
(may cause redundancy).
• What are correlated features? When features that are meant
to measure different characteristics are influenced by some
common mechanism and tend to vary together, they are
called correlated features.
– Adding too many features can, paradoxically, lead to a worsening
of performance (i.e., “curse” of dimensionality).
36
Curse of Dimensionality
• One of the recurring problems encountered in applying statistical
techniques to pattern recognition problems has been called the “curse of
dimensionality.”
• Curse of Dimensionality refers to a set of problems that arise when
working with high-dimensional data.
• The dimension of a dataset corresponds to the number of
attributes/features that exist in a dataset. A dataset with a large number
of attributes is referred to as high dimensional data.
• For example, methods that are analytically or computationally
manageable in low-dimensional spaces can become completely
impractical in a space of 50 or 100 dimensions.
• Model’s training time would be higher with large number of features
• Algorithm’s running time may increase (sometimes exponentially) with
higher number of features.
• In some cases, classification errors might increase, making the exercise
counter-intuitive
• Dimensionality reduction is required to overcome this 37
Example: Curse of Dimensionality
• The number of training data could depend exponentially
on the number of features.
• Example:
– Divide each of the input features into a number of intervals M, so that the value
of a feature can be specified approximately by saying in which interval it lies.
38
Missing Features
• Certain features might be missing (e.g., due to occlusion).
39
Decision Boundary
• How should we assign a given pattern to a class
(i.e., “salmon” or “sea bass”)?
• Classifiers use the training data to partition the
feature space into different regions (i.e., find the
decision boundary).
41
Overfitting
42
Overfitting
Overfitting
43
Computational Complexity
• How does an algorithm scale with the
number of:
• features
• Training data
• categories
44
Would it be possible to build a
“general purpose” PR system?
45
To do
• Read Chapter 1 from “Pattern Classification”
by Richard O. Duda, Peter E. Hart and David G.
Stork
46