Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
39 views

Introduction To Pattern Recognition

The document provides an introduction to pattern recognition, including definitions of key concepts like machine learning, classification, clustering, patterns, classes, and the main approaches and objectives of pattern recognition. It also discusses some common pattern recognition tasks and applications.

Uploaded by

Jenil Saliya
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views

Introduction To Pattern Recognition

The document provides an introduction to pattern recognition, including definitions of key concepts like machine learning, classification, clustering, patterns, classes, and the main approaches and objectives of pattern recognition. It also discusses some common pattern recognition tasks and applications.

Uploaded by

Jenil Saliya
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

Introduction to Pattern

Recognition

1
Reference book:
• “Pattern Classification” by Richard O. Duda,
Peter E. Hart and David G. Stork

2
Some Pattern Recognition tasks performed
by humans in day to day life
• Recognize a face
• Understand spoken words
• Read handwritten characters
• Decide if an apple is ripe based on its smell
• Identify presence of car keys in our pocket by feel

and any more …

3
What is Machine Learning
Definition:
– Machine Learning is a field. that strives to incorporate the Learning Ability into
machines. [Machine Learning, Tom Mitchell]

– Machine Learning is a field that is concerned with building machines that can learn and
improve automatically with experience.

– Formal Definition of Machine Learning : “A computer program is said to learn from


experience E with respect to some class of tasks T and performance measure P, if its
performance at tasks in T , as measured by P, improves with experience E.”

Comment :
“We do not know yet how to make machines learn as well as humans do, but algorithms
have been invented that are effective for certain types of machine-learning tasks.”
Some Pattern Recognition tasks we
would like machines to perform
• Speech Recognition
• Fingerprint Recognition
• Face Recognition
• Optical Character Recognition
• Object Identification/Classification
• DNA sequence identification

and many more..


5
Pattern Recognition/Classification
• A pattern is a regularity in the world, in human-made design,
or in abstract ideas.
• Pattern Recognition- the scientific discipline of taking in raw
data and assigning it to a category/class with the purpose of
performing a suitable action.
• Assign an object or an event (pattern) to one of several known
categories (or classes).

Category “A”

Category “B”

6
Classification vs Clustering

Classification (known categories)


(Supervised Learning)

Category “A” Clustering (unknown categories)


(Unsupervised Learning)

Category “B”

7
Classification vs Clustering

• Supervised learning (classification)


– Supervision: The training data (observations, measurements, etc.) are
accompanied by labels indicating the class of the observations
– New data is classified based on the training set
• Unsupervised learning (clustering)
– Clustering is the process of grouping the data into classes or clusters,
so that objects within a cluster have high similarity in comparison to
one another but are quite dissimilar to objects in other clusters.
– The class labels of training data are unknown and number of classes
to be learned may not be known in advance.

8
What is a Pattern?
• An object or event.  x1 
x 
• Represented by a vector x of values  2
corresponding to various features. x . 
 
.
 xn 

biometric patterns hand gesture patterns

9
What is a Pattern? (con’t)
• Loan/Credit card applications
– Income, # of dependents, mortgage amount  credit
worthiness classification

• Dating services
– Age, hobbies, income “desirability” classification

• Web documents
– Key-word based descriptions (e.g., documents
containing “football”, “NFL”)  document classification

10
What is a Class ?
• A collection of “similar” objects.

class: ‘Female’ class: ‘Male’

11
Main Objectives
(1) Separate data belonging to different classes.

(2) Assign new data to the correct class.

Gender Classification

12
Main Approaches
x: input vector (pattern)
ω1
ω: class label (class)

• Generative ω2
– Models the joint probability, p(x, ω).
– Makes predictions by using Bayes rule to calculate P(ω/x).
– Pick the most likely class label ω.

• Discriminative
– Does not model p(x, ω).
– Estimates P(ω/x) by “learning” a direct mapping from x to ω (i.e.,
estimate decision boundary).
– Pick the most likely class label ω.
13
How do we model p(x, ω)?
• Typically, using a statistical model.
– probability density function (e.g., Gaussian)

male
Gender Classification female

14
Data Variability
• Intra-class variability

The letter “T” in different typefaces

• Inter-class variability

Letters/Numbers that look similar

• We typically deal with this issue by collecting a large


number of examples and a “good” set of features.
15
Some applications of Patern Recognition

16
Handwriting Recognition

17
License Plate Recognition

18
Biometric Recognition

19
Face Detection/Recognition

Detection

Matching

Recognition

20
Fingerprint Classification
Important step for speeding up identification

21
Autonomous Systems
Obstacle detection and avoidance
Object recognition

22
Medical Applications
Skin Cancer Detection Breast Cancer Detection

23
Land Cover Classification
(using aerial or satellite images)

Many applications including “precision” agriculture.

24
More Applications

• Recommendation systems
– e.g., Amazon, Netflix
• Email spam filters
• Malicious website detection
• Loan/Credit Card Applications

25
Main Phases in Pattern Recognition
Testing Training

26
Complexity of PR – An Example
camera

Problem: Sorting
incoming fish on a
conveyor belt.

Assumption: Two
kind of fish:
(1) sea bass
(2) salmon

27
Sensors
• Sensing:
– Use some kind of a
sensor (e.g., camera,
weight scale) for data
capture.
– PR’s overall performance
depends on bandwidth,
resolution, sensitivity,
distortion of the sensor
being used.

28
Preprocessing
A critical step for reliable feature extraction!

Examples:

• Noise removal

• Image enhancement

• Separate touching
or occluding fish

• Extract boundary of
each fish

29
Training/Test data
• How do we know that we have collected an
adequately large and representative set of
examples for training/testing the system?

Training Set ?

Test Set ?

30
Feature Extraction
• How to choose a good set of features?
– Discriminative features

– Invariant features (e.g., invariant to geometric


transformations such as translation, rotation and
scale)
• Are there ways to automatically learn which
features are best ?
31
Feature Extraction
• Let’s assume that a fisherman
told us that a sea bass is
generally longer than a
salmon.
• We can use length as a
feature and decide between
sea bass and salmon
according to a threshold on
length.
• How should we choose the
threshold?
32
Feature Extraction (cont’d)
Histogram of “length”

threshold l*

• Even though sea bass is longer than salmon on


the average, there are many examples of fish
where this observation does not hold.
33
Feature Extraction (cont’d)
• Consider a different feature, e.g., “lightness”
Histogram of “lightness”

threshold x*

• It seems easier to choose the threshold x* but we still


cannot make a perfect decision.
34
Multiple Features
• To improve recognition accuracy, we might need
to use more than one features.
– Single features might not yield the best performance.
– Using combinations of features might yield better
performance.

 x1  x1 : lightness
x  x2 : width
 2

35
Multiple Features (cont’d)
• Does adding more features always help?
– It might be difficult and computationally expensive to
extract more features.
– Correlated features might not improve performance
(may cause redundancy).
• What are correlated features? When features that are meant
to measure different characteristics are influenced by some
common mechanism and tend to vary together, they are
called correlated features.
– Adding too many features can, paradoxically, lead to a worsening
of performance (i.e., “curse” of dimensionality).

36
Curse of Dimensionality
• One of the recurring problems encountered in applying statistical
techniques to pattern recognition problems has been called the “curse of
dimensionality.”
• Curse of Dimensionality refers to a set of problems that arise when
working with high-dimensional data.
• The dimension of a dataset corresponds to the number of
attributes/features that exist in a dataset. A dataset with a large number
of attributes is referred to as high dimensional data.
• For example, methods that are analytically or computationally
manageable in low-dimensional spaces can become completely
impractical in a space of 50 or 100 dimensions.
• Model’s training time would be higher with large number of features
• Algorithm’s running time may increase (sometimes exponentially) with
higher number of features.
• In some cases, classification errors might increase, making the exercise
counter-intuitive
• Dimensionality reduction is required to overcome this 37
Example: Curse of Dimensionality
• The number of training data could depend exponentially
on the number of features.
• Example:
– Divide each of the input features into a number of intervals M, so that the value
of a feature can be specified approximately by saying in which interval it lies.

– The total number of cells will be MD (D: # of features).


– Assuming uniform sampling, each cell should contain at least one data point, i.e.,
the number of training data grows exponentially with D.

38
Missing Features
• Certain features might be missing (e.g., due to occlusion).

• How should we train the classifier with missing features?


– Ignore tuples with missing values or
– Substitute with suitable values
• Manual substitution
• Substitute with most probable values

• How should the classifier make the best decision with


missing features ?

39
Decision Boundary
• How should we assign a given pattern to a class
(i.e., “salmon” or “sea bass”)?
• Classifiers use the training data to partition the
feature space into different regions (i.e., find the
decision boundary).

How should we find an optimal decision boundary? 40


Decision Boundary (cont’d)
• Classifiers find the decision boundary by minimizing an
error function (e.g., classification error on the training
data).
• In general, we can get perfect classification results on the
training set by choosing a complex model.

41
Overfitting

• Complex models are tuned to the particular training data,


rather than on the characteristics of the true model (i.e.,
memorization or overfitting).
• Overfitting of data implies poor generalization!
• Classification models should be sufficiently generalized
to be of practical use in real world situations.
• How to detect overfitting in classification? If classification
accuracy is high on training set but deteriorates on
unseen examples, overfitting has occurred.

42
Overfitting
Overfitting

43
Computational Complexity
• How does an algorithm scale with the
number of:
• features
• Training data
• categories

• Need to consider tradeoffs between


computational complexity and performance.

44
Would it be possible to build a
“general purpose” PR system?

• It would be very difficult to design a system that is capable


of performing a variety of classification tasks.
– Different problems require different features.
– Different features might yield different solutions.
– Different tradeoffs exist for different problems.

45
To do
• Read Chapter 1 from “Pattern Classification”
by Richard O. Duda, Peter E. Hart and David G.
Stork

46

You might also like