Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Lecture 4

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

MODULE 3

Representation of Patterns and Classes

LESSON 3
Different Representation Schemes

Keywords: Vector, String, Logical, Representations

1
What is a pattern?

• A pattern represents a physical object or an abstract notion. For ex-


ample, the pattern may represent physical objects like balls, animals
or furniture. Abstract notions could be like whether a person will play
tennis or not(depending on features like weather etc.).

• It gives the description of the object or the notion.

• The description is given in the form of attributes of the object.

• These are also called the features of the object.

What are classes?

• The patterns belong to two or more classes.

• The task of pattern recognition pertains to finding the class to which


a pattern belongs.

• The attributes or features used to represent the patterns should be


discriminatory attributes. This means that they help in classifying the
patterns.

• The task of finding the discriminatory features is called feature extrac-


tion/selection.

What is classification?

• Given a pattern, the task of identifying the class to which the pattern
belongs is called classification.

• Generally, a set of patterns is given where the class label of each pattern
is known. This is known as the training data.

• The information in the training data should be used to identify the


class of the test pattern.

2
X4 X5
f2

X X1 P X6 X
8
3
X
X7 9
X2

f1

Figure 1: Dataset of two classes

• This type of classification where a training set is used is called super-


vised learning. In supervised learning, we can learn about the values
of the features for each class from the training set and using this infor-
mation, a given pattern is classified.

Consider the patterns of two classes given in Figure 1. This is the


training data.
Using the training data, we can classify the pattern P. The information
of the two classes available in the training data can be used to carry
out this classification. There are a number of classifiers which carry out
supervised classification like nearest neighbour and related algorithms,
Bayes classifier, decision trees, SVM, neural networks, etc which are
discussed in later modules.
Representation of patterns
• Patterns can be represented in a number of ways.

• All the ways pertains to giving the values of the features used for that
particular pattern.

3
• For supervised learning, where a training set is given, each pattern in
the training set will also have the class of the pattern given.
Representing patterns as vectors
• The most popular method of representing patterns is as vectors.

• Here, the training dataset may be represented as a matrix of size (nxd),


where each row corresponds to a pattern and each column represents
a feature.

• Each attribute/feature/variable is associated with a domain. A domain


is a set of numbers, each number pertains to a value of an attribute for
that particular pattern.

• The class label is a dependent attribute which depends on the ‘d’ in-
dependent attributes.
Example

The dataset could be as follows :

f1 f2 f3 f4 f5 f6 Class label
Pattern 1: 1 4 3 6 4 7 1
Pattern 2: 4 7 5 7 4 2 2
Pattern 3: 6 9 7 5 3 1 3
Pattern 4: 7 4 6 2 8 6 1
Pattern 5: 4 7 5 8 2 6 2
Pattern 6: 5 3 7 9 5 3 3
Pattern 7: 8 1 9 4 2 8 3

In this case, n=7 and d=6. As can be seen,each pattern has six attributes(
or features). Each attribute in this case is a number between 1 and 9. The
last number in each line gives the class of the pattern. In this case, the class
of the patterns is either 1, 2 or 3.

• If the patterns are two- or three-dimensional, they can be plotted.

• Consider the dataset

4
4 6
f2 7
3 5 8

2 9 12

1X 10
1 2X X X 4 11
3

1 2 3 4 5
f1

Figure 2: Dataset of three classes

Pattern 1 : (1,1.25,1) Pattern 2 : (1,1,1)


Pattern 3 : (1.5,0.75,1) Pattern 4 : (2,1,1)
Pattern 5 : (1,3,2) Pattern 6 : (1,4,2)
Pattern 7 : (1.5,3.5,2) Pattern 8 : (2,3,2)
Pattern 9 : (4,2,3) Pattern 10 : (4.5,1.5,3)
Pattern 11 : (5,1,3) Pattern 12 : (5,2,3)

Each triplet consists of feature 1, feature 2 and the class label. This is
shown in Figure 2.

Representing patterns as strings


• Here each pattern is a string of characters from an alphabet.
• This is generally used to represent gene expressions.
• For example, DNA can be represented as

GTGCATCTGACTCCT...

RNA is expressed as

5
GUGCAUCUGACUCCU....

This can be translated into protein which would be of the form

VHLTPEEK ....

• Each string of characters represents a pattern. Operations like pattern


matching or finding the similarity between strings are carried out with
these patterns.

• More details on proteins and genes can be got from [1].

Representing patterns by using logical operators

• Here each pattern is represented by a sentence(well formed formula) in


a logic.

• An example would be
if (beak(x) = red) and (colour(x) = green) then parrot(x)
This is a rule where the antecedent is a conjunction of primitives and
the consequent is the class label.

• Another example would be


if (has-trunk(x)) and (colour(x) = black) and (size(x) = large) then
elephant(x)

Representing patterns using fuzzy and rough sets

• The features in a fuzzy pattern may consist of linguistic values, fuzzy


numbers and intervals.

• For example, linguistic values can be like tall, medium, short for height
which is very subjective and can be modelled by fuzzy membership
values.

6
• A feature in the pattern maybe represented by an interval instead of a
single number. This would give a range in which that feature falls. An
example of this would be the pattern

(3, small, 6.5, [1, 10])

The above example gives a pattern with 4 features. The 4th feature
is in the form of an interval. In this case the feature falls within the
range 1 to 10. This is also used when there are missing values. When
a particular feature of a pattern is missing, looking at other patterns,
we can find a range of values which this feature can take. This can be
represented as an interval.

The example pattern given above has the second feature as a linguistic
value. The first feature is an integer and the third feature is a real
value.

• Rough sets are used to represent classes. So, a class description will
consist of an upper approximate set and a lower approximate set. An
element y belongs to the lower approximation if the equivalence class
to which y belongs is included in the set. On the other hand y belongs
to the upper approximation of the set if its equivalence class has a non-
empty intersection with the set. The lower approximation consists of
objects which are members of the set with full certainty. The upper
approximation consists of objects which may possibly belong to the set.

• For example, consider Figure 3. This represents an object whose loca-


tion can be found by the grid shown. The object shown completely cov-
ers (A3,B2), (A3,B3), (A4,B2) and (A4,B3). The object falls partially
in (A2,B1),(A2,B2),(A2,B3), (A2,B4),(A3,B1),(A3,B4),(A4,B1),(A4,B4),
(A5,B2), and (A5,B3). The pattern can be represented as a rough set
where the first four values of the grid gives the lower approximation
and the rest of the values of the grid listed above form the upper ap-
proximation.

• Not just the features, each pattern can have grades of membership to
every class instead of belonging to one class. In other words, each

7
B1 B2 B3 B4

A1

A2

A3

A4

A5

Figure 3: Representation of an object

pattern has a fuzzy label which consists of c values in [0,1] where each
component gives the grade of membership of the pattern to one class.
Here c gives the number of classes. For example, consider a collection of
documents. It is possible that each of the documents may be associated
with more than one category. A paragraph in a document, for instance,
may be associated with sport and another with politics.

• The classes can also be fuzzy. One example of this would be to have
linguistic values for classes. The classes for a set of patterns can be
small and big. These classes are fuzzy in nature as the perception of
small and big is different for different people.

References
[1] Andreas D. Baxevanis(Ed), B.F. Francis Ouelette(Ed) Bioinformatics :
A Practical Guide to the Analysis of Genes and Proteins John Wiley
and Sons Incorporated, 3rd Edition, October 2004.

You might also like