Lecture 4
Lecture 4
Lecture 4
LESSON 3
Different Representation Schemes
1
What is a pattern?
What is classification?
• Given a pattern, the task of identifying the class to which the pattern
belongs is called classification.
• Generally, a set of patterns is given where the class label of each pattern
is known. This is known as the training data.
2
X4 X5
f2
X X1 P X6 X
8
3
X
X7 9
X2
f1
• All the ways pertains to giving the values of the features used for that
particular pattern.
3
• For supervised learning, where a training set is given, each pattern in
the training set will also have the class of the pattern given.
Representing patterns as vectors
• The most popular method of representing patterns is as vectors.
• The class label is a dependent attribute which depends on the ‘d’ in-
dependent attributes.
Example
f1 f2 f3 f4 f5 f6 Class label
Pattern 1: 1 4 3 6 4 7 1
Pattern 2: 4 7 5 7 4 2 2
Pattern 3: 6 9 7 5 3 1 3
Pattern 4: 7 4 6 2 8 6 1
Pattern 5: 4 7 5 8 2 6 2
Pattern 6: 5 3 7 9 5 3 3
Pattern 7: 8 1 9 4 2 8 3
In this case, n=7 and d=6. As can be seen,each pattern has six attributes(
or features). Each attribute in this case is a number between 1 and 9. The
last number in each line gives the class of the pattern. In this case, the class
of the patterns is either 1, 2 or 3.
4
4 6
f2 7
3 5 8
2 9 12
1X 10
1 2X X X 4 11
3
1 2 3 4 5
f1
Each triplet consists of feature 1, feature 2 and the class label. This is
shown in Figure 2.
GTGCATCTGACTCCT...
RNA is expressed as
5
GUGCAUCUGACUCCU....
VHLTPEEK ....
• An example would be
if (beak(x) = red) and (colour(x) = green) then parrot(x)
This is a rule where the antecedent is a conjunction of primitives and
the consequent is the class label.
• For example, linguistic values can be like tall, medium, short for height
which is very subjective and can be modelled by fuzzy membership
values.
6
• A feature in the pattern maybe represented by an interval instead of a
single number. This would give a range in which that feature falls. An
example of this would be the pattern
The above example gives a pattern with 4 features. The 4th feature
is in the form of an interval. In this case the feature falls within the
range 1 to 10. This is also used when there are missing values. When
a particular feature of a pattern is missing, looking at other patterns,
we can find a range of values which this feature can take. This can be
represented as an interval.
The example pattern given above has the second feature as a linguistic
value. The first feature is an integer and the third feature is a real
value.
• Rough sets are used to represent classes. So, a class description will
consist of an upper approximate set and a lower approximate set. An
element y belongs to the lower approximation if the equivalence class
to which y belongs is included in the set. On the other hand y belongs
to the upper approximation of the set if its equivalence class has a non-
empty intersection with the set. The lower approximation consists of
objects which are members of the set with full certainty. The upper
approximation consists of objects which may possibly belong to the set.
• Not just the features, each pattern can have grades of membership to
every class instead of belonging to one class. In other words, each
7
B1 B2 B3 B4
A1
A2
A3
A4
A5
pattern has a fuzzy label which consists of c values in [0,1] where each
component gives the grade of membership of the pattern to one class.
Here c gives the number of classes. For example, consider a collection of
documents. It is possible that each of the documents may be associated
with more than one category. A paragraph in a document, for instance,
may be associated with sport and another with politics.
• The classes can also be fuzzy. One example of this would be to have
linguistic values for classes. The classes for a set of patterns can be
small and big. These classes are fuzzy in nature as the perception of
small and big is different for different people.
References
[1] Andreas D. Baxevanis(Ed), B.F. Francis Ouelette(Ed) Bioinformatics :
A Practical Guide to the Analysis of Genes and Proteins John Wiley
and Sons Incorporated, 3rd Edition, October 2004.