Pattern Classi®cation With Principal Component Analysis and Fuzzy Rule Bases
Pattern Classi®cation With Principal Component Analysis and Fuzzy Rule Bases
Pattern Classi®cation With Principal Component Analysis and Fuzzy Rule Bases
www.elsevier.com/locate/dsw
a
Lehrstuhl f
ur Unternehmensforschung, RWTH, Templergraben 64, D-52056 Aachen, Germany
b
Computer Center, Indian Institute of Chemical Technology, Uppal Road, Hyderabad 500 007, Andhra Pradesh, India
Received 8 December 1998; accepted 13 April 1999
Abstract
For the rst time, the principal component analysis has been used to reduce the feature space dimension in fuzzy rule
based pattern classiers. A modied threshold accepting algorithm (MTA) proposed elsewhere by V. Ravi and H.-J.
Zimmermann [European Journal of Operational Research 123 (1) (2000) 1628] has been used to minimize the number
of rules in the classier while guaranteeing high classication power. The proposed methodology has been demon-
strated for (1) the wine classication problem, which has 13 features and (2) the Wisconsin breast cancer determination
problem, which has 9 features. The inuence of the type of aggregator used in the classication algorithm and the
number of partitions used for each of the feature spaces is also studied. In conclusion, the results are encouraging as
there is no reduction in the classication power in both the problems, despite the fact that some of the principal
components have been deleted form the study before invoking the classier. On the contrary, however, the rst ve
principal components in both the problems yielded 100% classication power in some cases. The high classication
power obtained for both the problems while working with reduced feature space dimension is the signicant outcome of
this study. 2000 Elsevier Science B.V. All rights reserved.
Keywords: Fuzzy sets; Data analysis; Feature selection; Principal component analysis; Modied threshold accepting
1. Introduction
0377-2217/00/$ - see front matter 2000 Elsevier Science B.V. All rights reserved.
PII: S 0 3 7 7 - 2 2 1 7 ( 9 9 ) 0 0 3 0 7 - 0
V. Ravi et al. / European Journal of Operational Research 126 (2000) 526533 527
to generate these fuzzy ifthen rules directly from overcome this problem, Ishibuchi et al. [6] have
numerical data. Kosko [6] employed neural net- introduced a method where the fuzzy ifthen rules
works to achieve this goal. Later, Ishibuchi et al. with a small number of antecedent conditions are
[3] proposed a sound methodology to generate generated as candidate rules. However, the au-
such rules from numerical data and then they went thors are of the opinion that it is still not a com-
ahead to apply a genetic algorithm to determine a plete remedy because the method is not a general
compact rule set with a high classication power one and it is not dicult to nd problems where
[4]. Then, a software by name W I N R O S A [12], rules with a small number of antecedent conditions
which automatically generates fuzzy ifthen rules are intractable. This motivated us for the devel-
from numerical data using statistical methods, opment of other alternative methods which con-
became available in the market. However, all the centrate on feature selection or reduction of
aforementioned studies dier from that of [3,4] in feature space dimension by transforming it. Thus
several aspects. it is meaningful to look for any unimportant
Throughout this paper, the partition of a pat- variables (features) and remove them from the
tern space means its granularity. To generate fuzzy classication process. This results in reduced
ifthen rules from numerical data one must (i) nd computational time and memory requirement and
the partition of a pattern space into fuzzy sub- an easy-to-use classier with a manageable num-
spaces and (ii) determine the fuzzy ifthen rule for ber of features. Thus the point we would like to
each fuzzy partition [3,4]. Using those fuzzy if drive home is that the feature selection is an es-
then rules, either the training data or the test data sential component of any classier, specially in
are classied, which is essentially the classication dealing with problems having a large number of
phase. The performance of such a classier de- features. Ravi and Zimmermann [9] addressed this
pends very much on the choice of a fuzzy partition. problem by resorting to the use of a software plug-
If a fuzzy partition is too coarse, many patterns in to DataEngine, viz FeatureSelector [11]. They
may be misclassied. On the other hand, if it is too used it as a pre-processor to select the most salient
ne, many fuzzy ifthen rules cannot be generated features from the original set of features and went
due to the lack of training patterns in the corre- on to derive a compact set of fuzzy ifthen rules
sponding fuzzy subspaces. In their earlier paper, with high classication power.
Ishibuchi et al. [3] have proposed distributed fuzzy In the present paper, however, the authors
rules, by considering the fuzzy rules corresponding propose another way of reducing the feature space
to both coarse and ne partitions of a fuzzy sub- dimension, via the principal component analysis
space. For example, a two-dimensional pattern (PCA). It is a traditional multivariate statistical
space gives rise to 90 22 32 42 52 62 technique frequently used for data compression
fuzzy ifthen rules, assuming that each feature [10]. However, the authors make it abundantly
dimension is divided into 6 partitions at the most. clear that a comparison of the present study with
Thus, they considered all 5 rule tables corre- our earlier one [9] is simply not meaningful be-
sponding to all the partitions simultaneously. Also cause the principal components are only linear
by considering all the fuzzy partitions simulta- combinations of the original feature variables and
neously, the above mentioned diculty in choos- hence, they do not reect on the importance or
ing an appropriate partition is obviated. otherwise of the original variables. The rest of the
The main drawback of this approach, however, paper has been structured as follows. Section 2
is that the number of fuzzy ifthen rules increase gives an overview of the principal component
exponentially for classication problems with high analysis and an algorithm to compute the principal
dimensional pattern spaces [5] such as wine clas- components. Section 3 briey presents the fuzzy
sication problem [3] where 13 feature variables rule based classication method and the formula-
are present. For example, if up to 5 partitions are tion of the multi-objective optimization problem.
used for each of the 13 feature variables, the total Results of the numerical simulations are discussed
number of rules would be 213 313 413 513 . To in Section 4, and Section 5 concludes the paper.
528 V. Ravi et al. / European Journal of Operational Research 126 (2000) 526533
2. Principal component analysis the rst principal component accounts for the
maximum variance and the second principal
2.1. Algorithm to determine principal components component accounts for the second largest vari-
[10] ance and so on.
is the ith fuzzy subset and the superscript K in- 3 partitions. For problems involving M classes
dicates the number of fuzzy subsets on each axis. and 2 features, a fuzzy ifthen rule corresponding
Thus, K denotes the grid size of a fuzzy partition to K 2 fuzzy subspaces has the following struc-
and dierentiates the rules belonging to dierent ture:
rule tables corresponding to 2; 3; . . . ; L partitions,
in the distributed representation of fuzzy rules. A Rule RKij : If xp1 is AKi and xp2 is AKj then Xp
symmetric triangular membership function in the
belongs to Class CijK ;
unit interval 0; 1 is used for AKi ; i 1; . . . ; K
[3,4,9]. Fig. 1 indicates the distributed represen- with CF CFijK ; i 1; 2; . . . ; K and
tation of the fuzzy ifthen rules for 2 features
when each feature is divided into a maximum of j 1; 2; . . . ; K;
SALL S 2 [ S 3 [ [ S L
fRKij j i 1; 2; . . . ; K; j 1; 2; . . . ; K
and K 2; 3; . . . ; Lg; 2
3.1. The multi-objective combinatorial optimization hospital in Madison, USA. This is also freely
problem available in the Internet via anonymous ftp from
ics.uci.edu in directory /pub/machine-
As in [4,9], the main objective is to nd a learning-databases/wisconsin-breast-
compact rule set S with very high classication cancer. This problem has 9 features or attributes
power by employing a combinatorial optimization which determine whether a patient is benign or
algorithm. The two objectives in the present malign. There are 683 samples or patterns. This
problem are: (i) maximize the number of correctly data has been used in the past by Mangasarian
classied patterns and (ii) minimize the number of et al. [7,8] and Bennet et al. [1]. A software in
fuzzy ifthen rules. Accordingly, we have A N S I . C has been developed by the authors on a
Pentium 100 MHz machine under Windows 95
Maximize NCPS and Minimize jSj platform using the MS-Visual C++ 5.0 compiler to
implement the model.
subject to S SALL ;
The methodology presented is tested in two
ways: (i) using the training data itself as the test
where NCPS is the number of correctly classied
data (ii) using the leave-one-out technique in the
patterns by S and jSj is the number of fuzzy if
testing phase. The latter method is preferable as
then rules in S. This is further reformulated as a
there is the danger of over-tting in the former
scalar optimization problem below.
method. In each of these methods, all the feature
spaces have been divided into a maximum of 5
Maximize f S WNCP NCPS WS jSj
partitions for both the examples. This is done in
subject to S SALL : 3 order to keep the computational complexity to a
reasonable level, as we work with a new set of 5
Since the classicatory power of the system is features (the principal components) in both the
more important than its compactness [3,4,9], the examples. Further, the study has been conducted
weights have been specied as 0 < WS WNCP and for 5 cases each corresponding to dierent aggre-
taken as WNCP 10:0 and WS 1:0 following [4]. gator viz (1) product operator (2) min operator (3)
For the details regarding the coding of the rules c-operator (compensatory and) [15] (4) fuzzy and
used in the optimization module, the reader is re- [14] and (5) a convex combination of min and max
ferred to [9]. We employ a meta-heuristic viz, operators [15].
modied threshold accepting algorithm [9] to solve Results of the wine classication problem (see
the problem just described. It should be kept in Table 1) indicate that the product operator per-
mind, however, that the algorithm used is a heu- formed consistently well and gave the best solution
ristic and that, depending on the initial feasible with 100% classication with just 11 rules whereas
solution the best solutions reported here may not the c-operator with c 0:1, came closely behind
be ecient. giving 100% classication with 13 rules for cases of
both 4 partitions and 5 partitions of the features.
min operator scored over the others when one
4. Numerical simulations and results looks at high classication power of 100% with 18
and 20 rules, respectively for 4 and 5 partitions.
The rst illustrative example solved using the Fuzzy and and a convex combination of min and
methodology presented here is the well-known max operators did not provide good solutions
wine classication problem for which the data are though the former performed better among the
freely available in the Internet [2]. It has 13 fea- two when 4 partitions were considered.
tures (attributes) which classify 178 patterns into The same example when studied with leave-
three types of wines. The second numerical ex- one-out technique (see Table 2), produced dif-
ample concerns the determination of the breast ferent results. Both the product operator and
cancer in humans from Wisconsin University c-operator (with c 0:1) provided the best
V. Ravi et al. / European Journal of Operational Research 126 (2000) 526533 531
Table 1
Results of example 1 (training data used as test data)a
# Partitions Operator
Product Minimum c-Operator Fuzzy and min/max
CP jSj CP jSj CP jSj CP jSj CP jSj
5 100 11 100 20 100 13 84.27 69 58.43 53
4 100 11 100 18 100 13 98.31 13 97.19 15
3 86.52 7 74.16 6 84.83 7 58.98 8 62.36 4
a
min/max indicates convex combination of min and max operators.
Table 2
Results of example 1 (leave-one-out technique)a
# Partitions Operator
Product Minimum c-Operator Fuzzy and min/max
CP jSj CP jSj CP jSj CP jSj CP jSj
5 100 3.04 91.01 2.74 100 3.04 82.02 62.56 91.57 87.9
4 100 3.07 93.26 2.87 100 3.07 99.44 3.38 98.88 3.36
3 60.11 1.8 60.11 1.8 60.11 1.8 60.11 1.9 60.11 1.8
a
min/max indicates convex combination of min and max operators.
Table 3
Results of example 2 (training data used as test data)a
# Partitions Operator
Product Minimum c-Operator Fuzzy and min/max
CP jSj CP jSj CP jSj CP jSj CP jSj
5 98.54 25 97.51 29 98.24 27 95.6 35 95.02 29
4 98.39 13 97.36 16 97.95 13 96.78 17 95.31 9
3 97.95 12 96.33 13 97.8 10 95.46 5 87.99 7
a
min/max indicates convex combination of min and max operators.
Table 4
Results of example 2 (leave-one-out technique)a
# Partitions Operator
Product Minimum c-Operator Fuzzy and min/max
CP jSj CP jSj CP jSj CP jSj CP jSj
5 100 9.3 99.85 7.3 100 7.3 96.34 22.23 98.68 22.82
4 92.39 2.78 92.68 2.79 92.38 2.78 95.75 2.87 100 3
3 98.09 3 82.28 2.47 97.66 2.93 98.83 2.96 71.88 2.15
a
min/max indicates convex combination of min and max operators.
solution with 100% classication with 3.04 rules max operators did not perform well, but they
on average, when 5 partitions were considered. outperformed the min operator when 4 partitions
Fuzzy and and a convex combination of min and were used.
532 V. Ravi et al. / European Journal of Operational Research 126 (2000) 526533
International Data Analysis Symposium, Aachen, Ger- [14] H.-J. Zimmermann, P. Zysno, Latent connectives in
many, 1997. human decision making, Fuzzy Sets and Systems 4 (1980)
[12] WINROSA, Manual, MIT GmbH, Aachen, Germany, 3751.
1997. [15] H.-J. Zimmermann, Fuzzy Set Theory and Its Applica-
[13] L. Zadeh, Fuzzy sets, Information and Control 8 (1965) tions, 2nd ed., Kluwer Academic Publishers, Dordrecht,
338353. 1991.