0.PR Representation
0.PR Representation
0.PR Representation
2
The Classification problem: We are given a collection of se-
mantically labelled patterns, X , where
X = {(X1 , chair), (X2 , chair), (X3 , human), (X4 , human), (X5 , human),
(X6 , chair), (X7 , human), (X8 , chair), (X9 , human), (X10 , human)}
3
Clustering Problem: We are given a collection, X , of syntacti-
cally labelled patterns, where
X = {X1 , X2 , , Xn }.
X = C1 C2 C3 CK
X = {X1 , X2 , , X10 }.
C1 = {X3 , X6 , X9 , X10 }
C2 = {X1 , X2 , X4 , X5 , X7 , X8 }
4
Semi-Supervised Classification Problem: We are given a col-
lection, X , given by
5
semantic computing: Knowledge in different forms is used in
clustering and classification to facilitate natural language under-
standing, software engineering, and information retrieval.
There are plenty of other areas like agriculture, education, and
economics where pattern recognition tools are routinely used.
Abstractions
For the human it is easy for comprehension and for the machine it
reduces the computational burden in the form time and space required
for processing.
In addition data mining tools are useful when the set of training pat-
terns is large.
6
So, naturally pattern recognition overlaps with machine learning, arti-
ficial intelligence and data mining.
7
Assignment
1. Consider the data, of four adults indicating their health status, shown
in the following table. Devise a simple classifier that can properly
classify all the four patterns. How is the fifth adult having a weight of
65 KGs classified using the classifier?
50 Unhelathy
60 Healthy
70 Healthy
80 Unhealthy
1 10 6 blue inexpensive
2 15 6 blue inexpensive
3 25 6 blue inexpensive
4 150 1000 red expensive
5 215 100 red expensive
6 178 120 red expensive
8
Different Paradigms for Pattern Recognition
There are several paradigms in use to solve the pattern recognition
problem.
Of the two, the statistical pattern recognition has been more popular
and received a major attention in the literature.
The main reason for this is that most of the practical problems in this
area have to deal with noisy data and uncertainty and statistics and
probability are good tools to deal with such problems.
On the other hand, formal language theory provides the background for
syntactic pattern recognition. Systems based on such linguistic tools,
more often than not, are not ideally suited to deal with noisy envi-
ronments. However, they are powerful in dealing with well-structured
domains. Also, recently there is a growing interest in statistical pattern
recognition because of the influence of statistical learning theory.
2
There are several soft computing tools associated with this notion. Soft
computing techniques are tolerant of imprecision, uncertainty and ap-
proximation. These tools include neural networks, fuzzy systems and
evolutionary computation.
neural networks,
fuzzy set and rough set based pattern recognition schemes.
X4 X5
f2
X X1 P X6 X
8
3
X
X7 9
X2
f1
3
The pattern P is a new sample (test sample) which has to be assigned
either to Class X or Class +. There are different possibilities; some
of them are
4
neighbours. Suppose such a weighted classifier assigns a weight of
0.4 to the first neighbour (pattern X1 , labelled X), a weight of
0.35 to the second neighbour (pattern X6 from class +) and a
weight of 0.25 to the third neighbour (pattern X7 from class +).
We first add the weights of the neighbours of P coming from the
same class. So, the sum of the weights for class X, WX is 0.4 as
only the first neighbour is from X. The sum of the weights for
class +, W+ is 0.6 (0.35 + 0.25) corresponding the remaining
two neighbours (8 and 6) from class +. So, P is assigned class
label +. We discuss combinations of classifiers in module 16.
In a system that is built to classify humans into tall, medium and
short, the abstractions, learnt from examples, facilitate assigning
one of these class labels (tall, medium or short) to a newly en-
countered human. Here, the class labels are semantic; they convey
some meaning.
In the case of clustering, we can group a collection of unlabelled
patterns also; in such a case, the labels assigned to each group of
patterns is syntactic, simply the cluster identity.
Several times, it is possible that there is a large training data
which can be directly used for classification. In such a context,
clustering can be used to generate abstractions of the data and use
these abstractions for classification. For example, sets of patterns
corresponding to each of the classes can be clustered to form sub-
classes. Each such subclass (cluster) can berepresented by a single
prototypical pattern; these representative patterns can be used to
build the classifier instead of the entire data set. In Modules 14
and 15, a discussion on some of the popular clustering algorithms
is presented.
Importance of Representation
5
Such a similarity function is computed based on the representation of
patterns; the representation scheme plays a crucial role in classification.
Example
Consider the following data where humans are to be categorized into tall
and short. The classes are represented using the feature Weight. If a newly
encountered person weighs 46 KGs, then he/she may be assigned the class
label short because 46 is closer to 50. However, such an assignment does not
appeal to us because we know that weight and the class labels tall and short
do not correlate well; a feature such as Height is more appropriate. Module
2 deals with representation of patterns and classes.
6
It is important to look for theoretical aspects of the limits of clas-
sifiers under uncertainty. Bayes classifier characterizes optimality
in terms of minimum error-rate classification. It is discussed in
Module 10.
A decision tree is a transparent data structure to deal with clas-
sification of patterns employing both numerical and categorical
features. We discuss decision tree classifiers in Module 11.
Using linear decision boundaries in high-dimensional spaces has
gained a lot of prominence in the recent past. Support vector
machines (SVMs) are built based on this notion. In Module 12
and 13, the role of SVMs in classification is explored.
It is meaningful to use more than one classifier to arrive at the
class label of a new pattern. Such combination of classifiers forms
the basis for Module 16.
In Modules 14 a discussion on some of the popular clustering algorithms
is presented.
There are several challenges faced while clustering large datasets. In
module 15 some of these challenges are outlined and algorithms for
clustering large datasets are presented.
Finally we consider an application of document classification and re-
trieval in module 17.
Assignment
1. Consider a collection of data items bought in a supermarket. The
features include cost of the item, size of the item and the class label.
The data is shown in the following table. Consider a new item with
cost = 34 and volume = 8. How do you classify this item using the
NNC? How about KNNC with K = 3?
2. Consider the problem of classifying objects into triangles and rectangles.
Which paradigm do you use? Provide an appropriate representation.
3. Consider a variant of the previous problem where the classes are small
circle and big circle. How do you classify such objects?
7
item no cost in Rs. volume in cm3 Class label
1 10 6 inexpensive
2 15 6 inexpensive
3 25 6 inexpensive
4 50 10 expensive
5 45 10 expensive
6 47 12 expensive
Further Reading
References
[1] V. Susheela Devi, M. Narasimha Murty. Pattern Recognition: An In-
troduction Universities Press, Hyderabad, 2011.
[2] R.O. Duda, P.E. Hart, D.G. Stork. Pattern Classification John Wiley
and Sons, 2000.
8
What is a pattern?
What is classification?
Given a pattern, the task of identifying the class to which the pattern
belongs is called classification.
Generally, a set of patterns is given where the class label of each pattern
is known. This is known as the training data.
2
X4 X5
f2
X X1 P X6 X
8
3
X
X7 9
X2
f1
All the ways pertains to giving the values of the features used for that
particular pattern.
3
For supervised learning, where a training set is given, each pattern in
the training set will also have the class of the pattern given.
Representing patterns as vectors
The most popular method of representing patterns is as vectors.
f1 f2 f3 f4 f5 f6 Class label
Pattern 1: 1 4 3 6 4 7 1
Pattern 2: 4 7 5 7 4 2 2
Pattern 3: 6 9 7 5 3 1 3
Pattern 4: 7 4 6 2 8 6 1
Pattern 5: 4 7 5 8 2 6 2
Pattern 6: 5 3 7 9 5 3 3
Pattern 7: 8 1 9 4 2 8 3
In this case, n=7 and d=6. As can be seen,each pattern has six attributes(
or features). Each attribute in this case is a number between 1 and 9. The
last number in each line gives the class of the pattern. In this case, the class
of the patterns is either 1, 2 or 3.
4
4 6
f2 7
3 5 8
2 9 12
1X 10
1 2X X X 4 11
3
1 2 3 4 5
f1
Each triplet consists of feature 1, feature 2 and the class label. This is
shown in Figure 2.
GTGCATCTGACTCCT...
RNA is expressed as
5
GUGCAUCUGACUCCU....
VHLTPEEK ....
An example would be
if (beak(x) = red) and (colour(x) = green) then parrot(x)
This is a rule where the antecedent is a conjunction of primitives and
the consequent is the class label.
For example, linguistic values can be like tall, medium, short for height
which is very subjective and can be modelled by fuzzy membership
values.
6
A feature in the pattern maybe represented by an interval instead of a
single number. This would give a range in which that feature falls. An
example of this would be the pattern
The above example gives a pattern with 4 features. The 4th feature
is in the form of an interval. In this case the feature falls within the
range 1 to 10. This is also used when there are missing values. When
a particular feature of a pattern is missing, looking at other patterns,
we can find a range of values which this feature can take. This can be
represented as an interval.
The example pattern given above has the second feature as a linguistic
value. The first feature is an integer and the third feature is a real
value.
Rough sets are used to represent classes. So, a class description will
consist of an upper approximate set and a lower approximate set. An
element y belongs to the lower approximation if the equivalence class
to which y belongs is included in the set. On the other hand y belongs
to the upper approximation of the set if its equivalence class has a non-
empty intersection with the set. The lower approximation consists of
objects which are members of the set with full certainty. The upper
approximation consists of objects which may possibly belong to the set.
Not just the features, each pattern can have grades of membership to
every class instead of belonging to one class. In other words, each
7
B1 B2 B3 B4
A1
A2
A3
A4
A5
pattern has a fuzzy label which consists of c values in [0,1] where each
component gives the grade of membership of the pattern to one class.
Here c gives the number of classes. For example, consider a collection of
documents. It is possible that each of the documents may be associated
with more than one category. A paragraph in a document, for instance,
may be associated with sport and another with politics.
The classes can also be fuzzy. One example of this would be to have
linguistic values for classes. The classes for a set of patterns can be
small and big. These classes are fuzzy in nature as the perception of
small and big is different for different people.
References
[1] Andreas D. Baxevanis(Ed), B.F. Francis Ouelette(Ed) Bioinformatics :
A Practical Guide to the Analysis of Genes and Proteins John Wiley
and Sons Incorporated, 3rd Edition, October 2004.