Association Rule and Decision Tree Based Methods For Fuzzy Rule Base Generation

Association Rule and Decision Tree based Methods for Fuzzy Rule Base Generation
Ferenc Peter Pach and Janos Abonyi Pannon University, Department of Process Engineering, Veszprem, P.O. Box 158, H-8201, Hungary, http://www.fmt.vein.hu/softcomp abonyij@fmt.vein.hu

Abstract This paper focuses on the data-driven generation of fuzzy IF...THEN rules. The resulted fuzzy rule base can be applied to build a classier, a model used for prediction, or it can be applied to form a decision support system. Among the wide range of possible approaches, the decision tree and the association rule based algorithms are overviewed, and two new approaches are presented based on the a priori fuzzy clustering based partitioning of the continuous input variables. An application study is also presented, where the developed methods are tested on the well known Wisconsin Breast Cancer classication problem.

I. I NTRODUCTION Human logic can be represented well by logical expressions in syntax of rules, with an antecedent and a consequent part. A short example can be: If somebody has forgotten her/his umbrella at home and it is pouring with rain then the chances are that she/he will be ooding. The set of logical rules is called rule base that is an easy and useful interpretation of the knowledge of a given area. Various types of logical rules can be discussed in the context of the decision borders these rules create in multidimensional feature space. The standard crisp propositional IF...THEN rules provide overlapping hyperrectangular covering areas, threshold logic rules are equivalent to separating hyperplanes, while fuzzy rules based on real-valued predicate functions (come from the prolog to [52]). Accordingly many rule based methods have been developed for extraction knowledge from databases. The paper [40] introduces a genetic programming (GP) and fuzzy logic based algorithm that extracts explanatory rules from micro array data. A hybrid approach is proposed in [7], where a standard GP and a heuristic hierarchical crisp rule-base construction are combined. A fuzzy mining algorithm based on Srikant and Agrawals method [48] is proposed for extracting generalized rules with the use of taxonomies [51]. In [34] compact fuzzy rules extraction is based on adaptive data approximation using B-splines. Rule bases are efciently used in many area but this paper concentrates rst of all to the prediction applications. Rule bases are successfully applied for example in stock exchange estimation [37], weather [32] or future sales forecasting [19]. The high prediction accuracy of the applied model (build from the extracted rules) is very important but the model understanding could be also very critical in many areas. It

is very useful to know what are in the background of the decisions, while rules could be edited or changed by the specialists of the application area. The compact and apprehensible predictive models via the visualization possibilities could help better human decisions. The paper [52] shows many computational intelligence techniques (based on decision trees, neural networks, etc.) that very useful tools to rule extraction and data understanding. In developments of the new rule based methods for prediction applications besides the retention and enhancement of achieved accuracies (in the classication problems), the one of the most important objects is to enlarge the interpretable of the rules. To take this aspect into account the one of the possible improvement ways is the adaptation of fuzzy logic. Besides the fuzzy methods could represent the discovered rules far natural for human, the fuzzy logic serves more robust predictive models (classiers) in case of false, inconsistent, and missing data. In this paper a fuzzy decision tree (Section II-B) and a fuzzy association rule based method (Section III-B) are introduced for fuzzy rule base generation. Our main goal is to show how construct compact fuzzy rule bases which can be used for data analysis, classication, or prediction. Therefore prediction accuracy (for classication problems) and understanding are together in focus during the rule extraction steps in both algorithms. The classication effectiveness of the proposed methods are tested on the Wisconsin Breast Cancer problem. The results are summarized in a short application study (Section IV). II. F UZZY DECISION TREE BASED METHODS A. Existent decision tree induction algorithms Decision tree based methods are widely used in data mining and decision support applications. Decision tree is fast and easy to use for rule generation and classication problems, moreover it is an excellent representation tool of decisions. The popularity and the spread of decision tree are based on the algorithm ID3 by Quinlan [46]. Many studies had been written to induction and analysis of decision trees [54], [47], [35], [36], [55]. The application areas of decision trees are also very breadth [6], [45], [15], [50], [49], [38].

Bare nuclei
small large

Data Partition method Prepruning Decision tree induction Fuzzy rule base Postpruning

Data Partition method Frequent itemsets Rule generation Fuzzy rule base Postpruning

Cell size
small large



Cell shape


Fig. 1. problem

Fuzzy decision tree for Wisconsin Breast Cancer classication

Since the 80s years many fuzzy decision tree induction algorithm have been introduced [2], [42], [39]. Fuzzy decision trees represent the discovered rules far natural for human (for example thanks to the linguistic variables). The [22] takes a detailed introduction about the non fuzzy rules and the different kind of fuzzy rules. The Figure 1 shows an example fuzzy decision tree for a classication problem. The aim of the classication is to distinguish between benign and malignant cancers based on the available attributes. The example tree uses only three attributes (three decision points: bare nuclei, cell size and cell shape) and represents three rules (the pathes from root to the letters) to the decision. In classication problems the continuous attributes in the input domain need partitioning. For example in Figure 1 the attribute cell size is partitioned into two overlapped partitions (two fuzzy sets) small and large. Many type of membership functions can be used (triangular, trapezoids, Gaussian, etc.) for partitions. While the papers in the literature discuss various methods, this paper focuses only the a priori partition based fuzzy decision tree induction algorithms. At the a priori based methods, the partition step is ahead the tree induction step. A new a priori partition and decision tree based extraction method is showed in the next subsection. B. A fuzzy decision tree based method Our method (Figure 2, on the left) consists the following main steps: 1) A supervised clustering algorithm is used for input domain partition. The supervised method takes into account the class labels during the clustering. Therefore the resulted partitions, the fuzzy membership functions (fuzzy sets) represent not only the distribution of data, but the distribution of the classes too. 2) During a pre-pruning method the resulted partitions

Fig. 2. Main steps of the decision tree (left) and the association rule (right) based methods

could be analyze and combine the unduly overlapped fuzzy sets. 3) The results of the pre-pruning step are input parameters (beside data) for the tree induction algorithm. The applied tree induction method is the FID (Fuzzy Induction on Decision Tree) algorithm by C. Z. Janikow [35]. 4) The resulted decision tree is analyzed and transformed by a proper method into a fuzzy rule base. 5) While the FID algorithm could generate more large and complex decision tree as it is necessary, therefore a postpruning method is applied to lter the unnecessary long rules and erase the weak (in classication point of view) rules from the fuzzy rule base. This method provides compact and transparent fuzzy rule base which can be use to build accurate fuzzy classiers. III. F UZZY ASSOCIATION RULE BASED METHODS A. Existent association rule mining and associative classier algorithms Besides the decision tree based techniques the association rule mining algorithms are the most frequently used data mining tools in rule extraction. Many kinds of methods are developed [3], [5], [4], [16], [18], [8], [12], [13], [14], [9], [11], [21], [17], [20] but two main steps are common in most of them. The mining starts with frequent item set searching (it is dened rst in paper [3]) then association rules are generated from the large item sets. The selection of an appropriate algorithm depends on the structure (sparse, dense) and the size of the analyzed database. Additionally the application area inuences also notable the suitable methods. The rst association rule mining algorithms primarily developed to discover the customer habits in the market basket analysis [5]. See an example transactional database in Table I. All the

T 1 2 3 4 5

Products milk, bread, beer, egg bread, diaper, egg, beer milk, bread milk, diaper, wine, beer milk, bread, diaper, beer TABLE I

500 450 400 350


300 250 200 Benign Malignant

rows have a transaction identity number (T) and each rows contain products buy together in the transaction. The aim is to understand the behavior of retail customers, or in other words, nding associations among the items purchased together. The products are called items and the item sets are the sets of the products. An item (item set) is called frequent item (item set) if it has higher support (the number of the occurences in database the number of it is purchased) as the predened minimum support threshold. For example if the minimal support is set to fty percentage, the item set <diapers, beer> is a frequent item set. A famous example of an association rule in such a database is diapers => beer, i.e. young fathers being sent off to the store to buy diapers, reward themselves for their trouble. An association rule have a condence measure that represents the strength of the relationship between the antecedent and consequent parts of the rule. An association rule is called valid rule if and only if the support and condent values are higher than the support and condence thresholds. Besides the possibility of the rule based analysis of the transactional databases, classiers can also built from the set of discovered association rules. The CBA algorithm [43] was the rst which integrates efcient the association and classication rule mining techniques. In last decade many associative classier algorithms are presented [23], [44], [53], [41], [57], [56], [58], [10], [59], [30]. The methods give rule bases with higher and higher classication power, but the most of them generate too large and complex classiers. How it has been already accentuated in Section I and Section II too complex rule bases are undesirable in aspect of the interpretability. Therefore our main goal was to construct an associative method which serves compact fuzzy rule bases from data which is applicable to build accurate fuzzy classiers. The next subsection introduces our new fuzzy association rule based method. B. A fuzzy association rule based method Our method (Figure 2, on the right) has the following main steps: 1) In the rst step a partitioning method is need to get discrete data elements on continuous attributes. The applied method is a fuzzy clustering algorithm to determine trapezoidal fuzzy membership functions for each attributes. 2) While the membership functions as fuzzy sets are counted for fuzzy items, the frequent item sets are searched on easy way. The membership values determines the supports of the items. The searching of the

Fig. 3.

Class distribution of Wisconsin Breast Cancer Data

larger item sets is based on the Apriori-principle [5]. 3) While our main application goal is the classier model identication, association rules with class label in the consequent part must be generated from the frequent item sets. 4) The classication rules determine most the results of prediction are selected by a correlation measure. These rules are called important rules. Only the positive correlated, above the average rules are stored in the rule base. The proposed method efciently works without database coverage analysis (which demands high computational capacity). 5) The unnecessary complex, redundant and conict rules are searched during a post-pruning method. The selected rules are removed from the rule base therefore only the most important and most condential rules could be use for fuzzy associative classiers. The earlier versions of both presented methods are detailed in our publications [29], [24], [25], [27], [26], [28]. Our actual results are encouraging, the classication power and complexity reduction of the presented methods are demonstrated with a short application study in the following section. IV. A PPLICATION STUDY If only classication rules is generated by the proposed methods, the rule bases and the input partitions serve classication models. This section shows an empirical analysis of the classication power of the proposed algorithms. The Wisconsin Breast Cancer data (WBCD) is available from the University of California, Irvine (UCI Repository, http://www.ics.uci.edu/ mlearn/MLRepository.html), is a real classication problem. The aim of the classication is to distinguish between benign and malignant cancers based on the available nine measurements: clump thickness, uniformity of cell size, uniformity of cell shape, marginal adhesion, single epithelial cell size, bare nuclei, bland chromatin, normal nuclei, and mitosis. The attributes have integer value in the range [1;10]. The original database contains 699 instances however 16 of these are omitted because these are incomplete, which is common with other studies. The class distribution (Figure 3) is 65.5 benign and 34.5 malignant, respectively.

1 0.5 0 1 0.5 0 1 0.5 0 1 0.5 0 1 0.5 0


1 0.5

cell size

thickn. c. size

c. shape

adhes. e.c. size b.nuclei


n. nuclei

mitosis b/m





0 1 0.5 0 1 0.5 0 1 0.5





cell shape












ep.cell size

0 0.2 0.4 0.6 0.8 1







n. nuclei
















Fig. 4. Partitions (fuzzy trapezoidal membership functions) determined by supervised Gath-Geva clustering algorithm for Wisconsin classication problem)
4 7 3 7 3 7 3 7 3 6 3 8 3 7 3 7 2 7



A. Classication by the fuzzy decision tree based method First see the results of our decision tree based algorithm. The selected a priori partition method was the supervised Gath-Geva clustering algorithm [31], [1]. The number of the initial number of the partitions for all attributes were equal with the number of classes, two. The resulted partitions are represented in Figure 4. The classication accuracy is measured by ten-fold cross validation. If the post-pruning factor is set to 1.6 (in the fth step of our algorithm), the average accuracy is 95.27% with 3.2 rules (number of the conditions: 6.8). An example rule base contains three fuzzy rules is the following: If uniformity of cell size is small and bare nuclei is small Then benign. If uniformity of cell size is large and uniformity of cell shape is large and bare nuclei is small Then malignant. If bare nuclei is large Then malignant. The decision tree contains the rule base is represented in Figure 1. It is a very compact, interpretable, but accurate fuzzy classication rule base for the Wisconsin problem. B. Classication by the fuzzy association rule based method First in the association rule based methods an implementation of the Gustafson-Kessel (GK) clustering algorithm is applied to partition the input attributes [33]. As it was in case of the decision tree based method, the number of the partitions for all attributes were two. The average (by ten-fold cross validation) classication accuracy is 95.85%. A visualization tool is also developed to represent the resulted fuzzy rule base structure. The Figure 5 represents an example rule base contains ten rules with 22.7 conditions. If the GK algorithm is changed with the easiest technique the Ruspini-type fuzzy partition method, more accurate (96%) and smaller (average 8.8 rules), but more complex (average 36.8 conditions) classier is resulted. An example rule base is depicted in Figure 6. Some rules are contained in both rule bases of methods but the gures represents that the associative method serves larger

Fig. 5. Fuzzy rule base for Wisconsin Breast Cancer classication problem (is generated by the association based method with Gustafson-Kessel clustering algorithm as partition technique)

thickn. c. size c. shape adhes. e.c. size b.nuclei chrom.

n. nuclei










0.5371 1 10 1 10 1 10 1 10 1 10 0 10 1 10 1 10 1 10

Fig. 6. Fuzzy rule base for Wisconsin Breast Cancer classication problem (is generated by the association based method with Ruspini-type partition technique)

rule bases (in both partition techniques) as the decision tree based algorithm. In Figure 5 the fth rule is equal with the third rule in the tree (If bare nuclei is large Then malignant). But rule base of the decision tree includes more compact the important knowledge to classication. See for example the rules number eight and nine together appear in the second rule of the tree.

V. C ONCLUSIONS This paper gave a short overview of the existent decision tree and association rule mining based rule extraction methods focused to build a fuzzy classier system. Beside the literature review, two new rule extraction methods have been presented to generate compact and accurate fuzzy rule base classiers. The results show the similarities of the two approaches, and highlight that the partitioning of the input variables plays an important role to the performance of the resulted classiers. R EFERENCES
