Algorithms of association rules extraction: State of the art

Hamida, Amdouni; Mohsen, Gammoudi Mohamed

Algorithms of association rules extraction: State of the Art AMDOUNI Hamida GAMMOUDI Mohamed Mohsen PhD Student, Member of RIADI Laboratory. FST, Tunisia. hamdouni.ecri@gmail.com Associate Professor, Member of RIADI Laboratory, ESSAI of Tunis, Tunisia mohamed.gammoudi@fst.rnu.tn Abstract—More than a decade, the task of generating associative rules has received considerable attention by researchers because the great need of enterprise deciders to be assisted by systems taking into account unknown knowledge extracted from a huge volume of data. In this paper, we present a survey of the most known algorithms used for associative rules extraction. We give a comparative study between them and we show that they could be classified into some categories. Keywords-Itemsets; Closed Itemsets;Frequent Itemsets;FCA; Associatives rules; I. INTRODUCTION The extraction of association rules is one of the most important techniques of data mining [1]. It consists of extracting unknown knowledge (pattern) from a large volume of data in order to help decision makers to take efficient and profitable decisions. Early approaches for extracting association rules are based on the generating the frequent Itemsets [2, 16, 8]. But, following the considerable computing time of their extraction, redundancy and irrelevance of rules generated, a new approach is introduced [12]. It consists of extracting a subset of generic non-redundant rules and without loss of information [3] based on the mathematical foundations of Formal Concept Analysis (FCA) [7]. This approach is based on extracting a subset of itemsets called closed itemsets, minimal generators and the relation between frequent itemsets. To our knowledge only algorithm Prince [9] focuses on presenting the minimal generators according to the partial order relation. Not taking this relationship into account involves the generation of a large number of rules as in the case: A-Close [12], Closet [13] and Charm [17]. The main goal of this paper is to present a state of the art of the most known algorithms in the literature that allow the generation of association rules. Before presenting these algorithms, we recall the basic concepts necessary for their understanding. II. BASIC NOTIONS A. Extraction context of an associative rule Let k = (O, I, R) be a triplet with O and I are respectively sets of objects (eg. transactions), sets of items and R  O x I is a binary relation between objects and items. B. Table of co-occurrences It indicates for each pair of items the number of cooccurrences in the set of objects. C. Itemset It is a nonempty subset of items. An itemset consisting of k elements is called k-itemset. D. Support of an itemset The frequency of simultaneous occurrence of an itemset (I’) in the set of objects called Supp(I’). E. Frequent itemset (FI) FI is a set of items whose support ≥ a user-specified threshold called minsup. All its subsets are frequent. The set of all frequent itemsets called SFI. F. Max Itemset An itemset which all its subsets are nonfrequents. G. Associative rule Any association rule having the following form: A  B, where A and B are disjoint itemsets with A is its premise (condition) and B is its conclusion. H. Confidence The confidence of an association rule A  B measures how often items in B appear in objects that contain A. Confidence(R) = Supp(A,B)/Supp(A) (1)  Supp(A,B) : the number of objects that the itemset A and the itemset B share.  Supp(A) : le number of objects that contain A. Based on the degree of confidence, association rules can be classified as follows:  Exact rule: rule which confidence = 1  Approximative rule: rule which confidence < 1  Valid rule: rule which confidence ≥ a user-specified threshold called minconf After going through some basic notions, we present different algorithm of associative rules extraction by a chronological order while showing their advantages and disadvantages. III. ALGORITHM OF ASSOCIATIVE RULES EXTRACTION In the literature, we can find two categories of algorithms, those which extract the associative rules from frequent Itemsets and those which are based on the Formal Concept Analysis (FCA) to generate a sub-set of frequent Itemsets called frequent closed Itemsets. A. Algorithm based on extraction of frequent itemsets 1) Apriori [2] Apriori is one of the first algorithms of associative rules extraction. In what follows, we are going to present it principle, its advantages and disadvantages. In fact, its principle consists of two steps, the first one is to find the set of frequent itemsets (SIF) out of the initial extraction context and the second one uses this set to determine the valid associative rules. a) Frequent itemsets search To find the set of frequent itemsets (SIF), it proceeds in repetitive way, as follows:  Find the set of IF size one called L1 and out of this one generate another set called C2 which includes the IF candidate size two. Any element of C2 having a support ≥ minsup becomes part of L2.  In general, any element of Ck+1 is the union of two IF  Lk having k-1 as a commun element. A generated IF size k+1 is deleted of Ck+1 if at least one of its subset of size k  Lk.  Its last process is repeated until Lk is empty. The result of this phase is the union of the different determined Lk b) Associative rules extraction To extract the associative rules out of the SIF found, it proceeds iteratively treating each Lk. In fact, a IF  Lk, the rules generated having the following form: IF – C  C (C: set of conclusive items). Any rule having the confidence: Supp(IF)/Supp(C) minconf is maintained. As a summary, the result generated by Apriori is clear and easy to interpret. But, we can find many problems. First one is the theoretical complexity  O(mn2m) [2] when n = |O| and m = |I|. The second one is the high number of database accesses in order to extract the SIF, to count the different supports and to generate the rules which generally are redundant and little useful. In fact, to evaluate these rules, the user’s intervention is required nevertheless it is expensive. In order to decrease the treatment time in the process of frequent itemsets extraction, [2] present a new version of Apriori, called AprioriTid. It based on the count of the candidates support indexing the different transactions with its ID called TID. We will present, in the following part, its general principle as long as its advantages and its disadvantages. 2) AprioriTid [2] It uses the same principle of Apriori to generate the candidates but it make a different count of their supports. Firstly, it generates a set of candidates called C1 representing the database. This set includes elements (TID, {c1}) which {c1} is the list of itemsets of size one in the transaction TID. When k>1, Ck is made using Ck-1. If one element in Ck having an empty list of k-itemsets in the TID, it is deleted. The support of the itemsets in ck is equal to the number of occurrences of each one in the Ck. In fact, in the first iterations, the set of candidates’ itemsets can be huge which causes storage problem. In addition, when k increases, the number of element in Ck become smaller than the transaction’s. 3) Partition [14] The objective of this algorithm is to reduce the number of accesses to the initial context in two ways. It divides in p partition D1, …, Dp. Each IF partition will be determined during the first database access. the SFI of the extraction context is the union of the different IF partitions. In the second one, it count the support of any element  SFI. As a summary, Partition does only two accesses to the database, it does not process the candidates Itemsets and the support is calculated using the TID intersection unlike the Apriori method. 4) Dic [5] Dic divides the database in M transactions masses. After going through one mass to determine the k-Itemsets candidates and counts their supports, it generate the (k+1)Itemsets out of the frequent k-Itemsets and its start to determine their supports. While studying Dic, we observe that the number of databases accesses is less than Apriori as it processes the candidates having different sizes simultaneously. But, the storage of itemsets presents a problem and the cost of supports count are more important than the necessary run time for Apriori. 5) Eclat [16] This algorithm uses the vertical format of the context extraction. Indeed, each item is associated with a Tidset (set of all transactions containing this item). It searches the set of frequent itemsets (SFI) of size one and two using the horizontal format of the database and then uses the transversal depth which is based on the concept of equivalence classes (two k-itemsets belong to the same class if they have k-1 prefixes common. Each class will be treated separately). In summary, the vertical format adopted by Eclat reduces the calculation f the support of an itemset since it’s a complete intersection of Tidsets. This allows automatically reduce the size of the database as transactions involving only an itemset are used for the intersection. In addition, this method can be parallelized, since each class can be treated separately to determine the frequent itemsets. The problem is that this method is effective for small databases. However, the representation is not possible in the case of large databases. 6) FP-Growth [8] To avoid the repetitive context accesses, [8] suggest an algorithm called FP-Growth (Frequent Pattern Growth) allowing the extraction of the SFI without generating candidates. It consists of compressing the database in a compact stucture called FP-tree based on the notion of Trie [10] which is made of:  A tree with no root, intermediate nodes containing three pieces of information: the corresponding item, its frequency and a pointer to the next node in the tree.  A list called index containing the list of frequent items. Each item is associated with a pointer to the first node of the tree containing this item. The construction of this structure requires two accesses to the extraction context:  The first to determine the frequent items and save them in the index list sorted in descending order of supports.  The second allows building the FP-tree knowing that the items in a transaction are stored according to their order in index A root is created, from what we model branch for various transactions. A transaction will be presented with a list of nodes containing the item, its frequency and a pointer to the next node (transaction with the same prefix will be presented by the same branch, same transaction will be submitted one) After packing the database, it will be divided into sub projections called conditional basis. Each of these bases is associated with a frequent item. The extraction of the frequent itemsets is done on each of the projections. Using FP-Growth, the frequent items are sorted in a decreasing order, implying that the most frequent items are near the root and are better shared by transaction. FP-tree is thus a compact representation and interesting. In addition, if the prefixes are shared by the transaction, the suffixes are not, that is why the number of nodes in FP-tree is reducing. But it should be noted that there is a storage problem if the FP-tree size becomes important [6]. 7) Conclusion After presenting some algorithms for generating association rules based on extraction of frequent itemsets, we can conclude that using this method presents some advantages such as easier understandings of the process of calculation adopted, clarity and ease of application of result. But it is also clear that it does not provide satisfactory results in the case of large volumes of data. Moreover, it is very difficult to determine the number of items in each rule. B. Algorithm of associative rules extraction using FCA approch This strategy has two steps, the first one consists of extracting the frequent closed itemsets based on the mathematical foundations of the formal concept analysis (FCA) [7] and the second step consists of developing a generic base including the informative rules in order to provide a useful and non redundant rules. In this part, we define some of the basic notions and we present several algorithms which FCA approach. 1) Basic notions a) Galois connection: In A formal context K is a triplet K = (O, I, R), For every set of objects A ⊆ O, the set f(A) of attributes in relation R with the objects of A is as follow: f(A) = { i ∈ I | oRi ∀ o ∈ A} (2) Dually, for every set of attributes B ⊆ I, the set g(B) of objects in relation R with the attributes of B is as follow: g(B) = { o ∈ O | oRi ∀ i ∈ B} (3) The two functions f and g deﬁned between objects and attributes form a Galois connection. The operators f ° g(B) and g ° f(A) called φ are the closure operators. φ veriﬁes the following properties X, Y  I (resp. X1, Y1 O):  Idempotent : φ2(X) = φ(X), (4)  extensive : X ⊆ φ(X), (5)  monotone : φ(X)  φ(Y). (6) b) Frequent Closed Itemset (FCI): An Itemset I’ is called closed if I’ = φ(I’). In other words, an itemset I’ is closed if the intersect of the objects to which I’ belongs is equal to I’ and it is frequent if its support ≥ minsup. SFCI is the Set of Frequent Closed Itemset. c) Minimal Generator: An Itemset c ⊆ I is a closed Itemset generator I’ ⇔ φ(c) = I’. c is a minimal frequent generator if its support is ≥ minsup. The set of frequent minimal generators of I’ called GMF)’’ GMF)’ = {c ⊆ I | φ(c) = I’  ∄ c1 ⊂ c as φ(c) = I’} (7) d) Negative border (GBd-): the set on no-frequents minimal generator. e) Positive border (GBd+) : Let GMFk is the set of all minimal frequent generators: GBd+= {c|c ≥ minsup  c  GMFk  c’  c, c’ GMFk} (8) f) Equivalent classes: The closure operator φ divides the set of frequents Itemsets into disjoint equivalent classes including elements having the same support. The largest element in a given class is an FCI called I’ and smaller ones are the GMFI’ [12]. g) Comparable equivalent classes: The classes Ci et Cj are only said comparable if FCI of Ci covers that of Cj. The five following notions are defined in [7]: h) Formal concept: a formal concept is a maximal objects-attributes subset where objects and attributes are in relation. More formally, it is a pair (A, B) with A  O and B  I, which verifies f(A) = B and g(B) = A. A is the extent of the concept and B is its intent. i) Partial order relation between concepts≤: The partial order relation called ≤ is deﬁned as follow: for two formal concepts (A1, B1) and (A2, B2): (A1, B1) ≤ (A2, B2)  A2  A1 and B1 B2. j) Meet / Join : for each concepts (A1, B1) and (A2, B2), it exist a greatest lower bound (resp. a least upper bound) called Meet (resp. Join) denoted as ((A1, B1)  (A2, B2) (resp. (A1, B1)  (A2, B2)) deﬁned by: (A1, B1)  (A2, B2) = (g(B1 ∩ B2), (B1 ∩ B2)) (9) (A1, B1)  (A2, B2) = ((A1 ∩ A2), f(A1 ∩ A2)) (10) k) Galois lattice: The Galois lattice associated to a formal context K is a graph composed of a set of formal concepts equipped with the partial order relation ≤. This graph is a representation of all the possible maximal correspondences between a subset of objects O and a subset of attributes I. l) Frequent minimal generators lattice: A partial ordered structure of which each equivalent class includes the appropriate frequent minimal generators [4]. m) Iceberg lattice: A partial ordered structure of frequent closed Itemsets having only the join operator. It is considered a superior semi-lattice [15]. n) Generic base of exact associative rules: It is a base composed of non-redundant generic rules having a confidence ratio equal to 1 and called GB [3]. Given a context (O, I, R), the set of frequent closed itemsets (SFCI) and the set of minimal generators GMFk : GB = {R: g  (c - g)|c  (SFCI) g  (GMFk), g  c} (11) o) Informative base of approximative associative rules: it is called IB and defined as follows: IB = {R: XY, Y SFCI, ƒ(X)≤Y, confidence(R) ≥ minconf, Supp(Y)≥ minsup} (12) 2) Extracting frequent closed itemsets: the algorithms After presenting some basic notions, we introduce a set of algorthms designed to extract frequent closed itemsets. a) Close [12]: this algorithm iterate the search space to extract minimal generators subsequently used to extract frequent closed itemsets associated. Each iteration involves two steps. The first is to do a self-join between the minimal generators found in the previous iteration to form a noted MGCk (k-Minimal Generators Candidates), each element consists of triplet (minimal generator, closed itemsets candidate: CIC, support of CIC). The second step is to prune the MGCk eliminating the triplet whose support of closed itemset > minsup. The valid rules are generated using the frequent itemsets derived from the frequent closed itemsets selected. It should be noted that this step is costly and can generate many redundant and irrelevant rules. b) A-Close[12]: to remedy the problem addressed in the algorithm Close. [12] have proposed a new algorithm called A-Close. It is to generate frequent closed itemsets from the associated minimal generators. c) Titanic [15]: the algorithm Titanic makes a breadth at each level to extract the frequent minimal generators and then deduct the frequent closed itemsets. To determine the k-generators, a self-join of (k-1)-generators is carried out and if one of subsets of a given generator is not frequent, then it will be deleted. In addition, to reduce the computing time of the support of k-generators, estimated support is associated with each of them and it is equal to the minimum value of the supports of two (k-1)-generators that form. For the case of the empty set, its estimated supporter is equal to the cardinality of context extraction. Every k-generator with a lower support to the minsup or equal to real support will be removed from the set. It is noteworthy that in the worst case, Titanic performs an average number of accesses to the extraction context equal to the maximum size of lists of candidates’ generators. d) Closet [13]: this algorithm has two stages, the first is to use the FP-tree structure to present the search space as a tree by eliminating non-frequent itemsets. The second step is to use this structure to extract frequent closed itemsets by dividing the tree into subtrees, called sub-conditional contexts in order to make an in-depth exploitation of search space. This implies that this algorithm requires two accesses to the context extraction, the first is to extract the list of frequent 1-itemsets and the second is to build the FP-tree. Sub-contexts of frequent 1-itemsets will be processed fist in order of increasing support. Each contains only items that collocate with 1-itemset in questin, called I’, and have a support above minsup. The frequent closed itemsets corresponding concatenation of I’ with all the items having the same support. The constructing process of the subcontexts continues recursively knowing that an item treaty will be excluded since all the frequent closed itemset containing it are already generated. In addition, to reduce extraction, a sub-context of an itemset will be build only if it is not covered by any frequent closed itemsets generated. This algorithm has some disadvantages, the first one is the processing cost of sorting the initial context with a decreasing value of the support. The second is to store a large number of sub-contexts in division recursive initial context. In addition, the fact of whether a given itemset is included or not in one part of the list of frequent closed itemsets found, requires the maintenance of this list in memory throughout the treatment. It should be noted that this algorithm does not manage a list of candidates, but in the case of low support and a context scattered, the number of itemsets included frequent closed itemsets generated is very small, which leads to both under construction sub-contexts that itemset. e) Charm [17]: in order to reduce the number of accesses to the extraction context as well as candidates generated for the extraction of frequent closed itemsets, Charm using both the set of closed itemsets and identifiers of transactions to which they belong (Tidset). It uses, therefore, a structure called the IT-tree (Itemset-Tidset Tree) where each node represents a pair of the form (frequent closed itemset candidate, Tidset). First, the 1-itemsets are added to the structure by decreasing support. Subsequently, a course will be made from the root from left to right and depth, to determine the frequent closed itemsets. For every two closed itemset candidates I1 and I2 have the same parents, are four cases to check:  If Tidset(I1) = Tidset(I2) then φ(I1) = φ(I2) = φ(I1I2) all occurrences of I1 replaced by (I1I2) and I2 will be removed from the tree since its closure is the same as (I1I2).  If Tidset(I1)  Tidset(I2) then φ(I1)  φ(I2) and φ(I1) = φ(I1I2), all occurrences of I1 are replaced by (I1I2) but not I2 will be removed from the tree because it can be a generator of another frequent closed itemsets.  If Tidset(I1)  Tidset(I2) then φ(I1)  φ(I2) and φ(I2) = φ(I1I2), I2 iis removed from the tree and a node of the form ((I1I2), (Tidset(I1)Tidset(I2))) is added to the list of descendants (I1).  If Tidset(I1) ≠ Tidset(I2) then φ(I1)  φ(I2)  φ(I1I2), the occurrences of I1 and I2 does not change and a node of the form ((I1I2), (Tidset(I1)Tidset(I2))) is added to the list of descendants (I1) if (I1I2) is frequent, otherwise the tree remains the same. It should be noted that Charm performs a single access to the context extraction to determine the lists of transactions 1itemsets. In addition, to reduce memory usage, it performs incremental storage of Tidsets using a data representation called Diffset as a Tidset of the candidate is the difference between his intention and that of his immediate parent. In addition, this reduces the processing time, since the determination of the intersection of two candidates I1 and I2 is nothing else than the result of the difference between Tidset(I1) and that of its parent P, one hand, and that between Tidset(I2) and that of its parent P, on the other. But, despite this, Charm is considered among the algorithms that consume lots of memory. f) Prince [9]: this algorithme can extract the minimal generators and build a structure partially ordered called Frequent minimal generators lattice in order to perform a vertical scan (bottom to top) to find the frequent closed itemsets and then extract the informative association rules are fewer non-redundant rules and without loss of information. First, Prince extracts minimal generators of the initial context ( is the first generator to extract, its support is equal to the cardinality of the initial context) . It determines, all k-generator candidates and by doing this, at each level k, a self-join of (k-1)-generators. Subsequently, it eliminates any candidate of size k if at least one of its subsets is not a minimal generator or if its support is equal to one of them. GMFk is the union of all sets of frequent minimal generators determined in each level, while no-frequents form the GBd-. Second, GMFk and GBd- be used to form a minimal generators lattice and this by comparing each minimal generator g to the list L of immediate successors of its subsets of size k-1. If L is empty then g is added to this list, otherwise, four cases are possible for each g1 L knowing that Cg and Cg1 are the equivalence classes of g and g1:  If (gg1) is a minimal generator then Cg and Cg1 are no-comparables.  If Supp(g) = Supp(g1) = Supp(gg1) then g and g1 a same class.  If Supp(g) < Supp(g1) = Supp(gg1) then g become the successor of g1.  If Supp(g) < Supp(g1)  Supp(gg1) then Cg and Cg1 are no-comparables. If (gg1) is not a minimal generator, the calculation of its support is performed by applying this proposal [15]: Let GMk = GMFk  GBd- (set of all generators), an itemset I’ is no-generator if: Supp(I’) = min{Supp(gi)| gi  GMk  gi  I’} (13) The research process of Supp(gg1) stops whene one of its subsets has a strictly lower support than that of g and g1 because this implies that Cg and Cg1 are no-comparables. After constructing the minimal generators lattice, Prince determines for each equivalence class, starting from C to the top, the frequent closed itemset and built the Iceberg lattice by applying this proposal: Let I1 and I2 are two frequent closed itemsets such as I1 covers I2 by the partial order relation and GMFI1 is the set of the frequent minimal generator of I1: I1 = {g | g GMFI1} I2 (14) The two lattices are used to extract exact and approximative rules. Note that the rules with confidence = 1 are exact and an implications extracted from each node (intra-node). Whereas, the approximative rules have confidence ≥ minconf are implications involving two comparable equivalence classes. These rules are implications between nodes. As proved in [9] all generated rules are no redundant and guarantee that there is no loss of information. But it should be noted that the complexity of the first step (the extraction of frequent minimal generators) is exponential, which implies an overall processing time high in the case of scattered contexts. g) GrGrowth [11]: this algorithm is developed in order to mine frequent generators and possitive border. It uses the compact data structure FP-tree and adopts the pattern growth approach. It constructs a conditional database for each frequent generator. The algorithm uses the depth-first-search strategy to explore the search space, which is much more efficient than the breadth-first-search strategy adopted by most of the existing generator mining algorithms. GrGrowth prunes a positive border during the mining process to save mining cost. In fact, generator based representations rely on a negative border to make the representation lossless. However, the number of itemsets on a negative border sometimes exceeds the total number of frequent itemsets and positive border is usually smaller than it. In addition, a set of frequent generators plus its positive border is always no larger than the corresponding complete set of frequent itemsets. 3) Conclusion This approach has two advantages. Firstly it reduces the run time and storage space by contributing to the first approach because the number of frequent closed itemsets and the processing time is smaller than that of extraction of all frequent itemsets. Secondly, the reduced number of association rules extracted without loss of information. IV. V. In this paper we presented a state of the art of the most knowing algorithms of extraction association rules. We observe that algorithms based on the set of Frequent Itemsets generation could be used easily for huge database. However, they present some limits such as: redundancy and less useful associative rules. The second kind of algorithms based on FCA approach could reduce the number of associative rules with the advantage of their relevance. But, the cost in memory need and the runtime increases especially when the formal context is sparse. REFERENCES [1] [2] [3] [4] COMPARATIVE STUDY In this section, we have identified four characteristics in order to classify the algorithms already presented. The first characteristic is the computational complexity. The second is the kind of itemsets extraction (FI or FCI). Third is the strategy of association rules extraction from initial context, knowing that there are three types:  Test-and-build: is to browse the database level, generating a set of candidates associated with each level and apply some metrics to reduce the result set called pruning.  Divide-and-Build: is to divide the database into subsets and apply to each subset the process of extracting itemsets in order to reduce the number of candidates. As already mentioned in the first category, a pruning step is performed.  Hybrid: is to deal in depth the database but without division. In addition, to reduce the set of closed itemsets result, a statistical metric and heuristics are used. The latter feature is called ―Data structures‖ which includes two sub classes: the structure used to represent the initial context, such as Tidsets, FP-tree and IT-tree, and the one used for storing the itemsets results such as hash-table, hash-tree, Trie and sparse matrix. (See Table I in the last page). CONCLUSION [5] [6] [7] [8] [9] [10] [11] [12] [13] R. Agrawal, T. Imielinski and A. N. Swami, Mining association rules between sets of items in large databases , In Proceedings of the International Conference on Management of Data, ACM S)GMOD’9 , Washington, D.C., USA, page 0 -216, May 1993. R. Agrawal and R.Srikant, Fast algorithms for mining association rules , In J. B. Bocca, M. Jarke and C. Zaniolo, editors, Proceedings of the 20th International Conference on Very Large Databases, Santiago, Chile, p.p. 478-499, June 1994. Y. Bastide, N. Pasquier, R. Taouil, L. Lakhal and G . Stumme, Mining minimal non-redundant association rules using frequent closed itemsets , Proceedings of the )ntl. Conference DOOD’ 000, LNCS, Springer-verlag, July 2000, p. 972-986. S. Ben yahia, C. Latiri, G.W. Mineau and A. Jaoua, Découverte des règles associatives non redondantes – application aux corpus textuels , In M.S. Hacid, Y. Kodrattof and D. Boulanger, editors EGC, volume 17 of Revue des Sciences Technologies de l’)nformation – série RIA ECA, pages 131-144. Hermes Sciences Publications, 2003. S. Brin, R. Motwani, J.D. Ullman and S. Tsur, Dynamic itemset counting and implication rules for market basket data , In : Proceedings ACM SIGMOD International Conference on Management of Data, Tucson, Arizona, USA, éd. par Peckham (Joan). pp. 255-264 - ACM Press, 1997. W. Cheung, W. Heung and O. Zaiane, “ Incremental Mining of Frequent Patterns Without Candidate Generation or Support Constraint‖, Proceedings of the Seventh International Database Engineering and Applications Symposium (IDEAS 2003), Hong Kong, China, July 2003. B. Ganter and R. Wille, Formal Concept Analysis , Mathematical Foundations, Springer, 1999. J. Han, J. Pei and Y. Yin, Mining frequent patterns without candidate generation , CM-SIGMOD Int. Conf. on Management of Data, pp. 1-12, Mai 2000. T. Hamrouni, S. Ben Yahia and Y. Slimani, Prince: An algorithm for generating rule bases without closure computations , In 7th International Conference on Data Warehousing and Knowledge discovery DaWaK’0 , pages -355, Copenhagen, Denmark, 2005. Springer-Verlag, LNCS. R. L. Kruse and A. J. Ryba, Data structures and program design in c++ , Prentice Hall, 1999. G. Liu, J. Li and L. Wong, A new concise representation of frequent Itemsets using generators and a positive border , Knowledge and Information Systems, 17(1) : 35-56, 2008. N. Pasquier, Y. Bastide, R. Taouil, L. Lakhal, Efficient Mining of Association Rules Using Closed )temset Lattices , Information Systems Journal, vol. 24, no 1, 1999, p. 25-46. J. Pei, J. Han, R. Mao, S. Nishio, S. Tang and D. Yang, CLOSET : An efficient algorithm for mining frequent closed Itemsets , Proceedings of the ACM S)GMOD DMKD’00, Dallas, TX, 00 , p. 21-30. [14] A. Savasere, E. Omiecinsky et S. Navathe, An efficient algorithm for mining association rules in large databases , 21st Int'l Conf. on Very Large Databases (VLDB), Septembre 1995. [15] G. Stumme, R. Taouil, Y. Basride, N. Pasquier and L. Lakhal, Computing Iceberg Concept Lattices with TITANIC , J. on Knowledge and Data Engineering (KDE), vol. 2, no 42, 2002, p. 189-222. [16] M. Zaki, S. Parthasarathy, M. Ogihara and W. Li, New algorithms for fast discovery of association rules , In : 3rd Intl. Conf. on TABLE I. Algorithm Complexity Apriori O(mn2m) Knowledge Discovery and Data Mining, éd. par Heckerman (D.), Mannila (H.), Pregibon (D.), Uthurusamy (R.) et Park (M.). pp. 283-296. AAAI Press, 1997. [17] M. Zaki and C. J. Hsiao, CHARM : An Efficient Algorithm for Closed Itemset Mining , Proceedings of the 2nd SIAM International Conference on Data Mining, Arlington, April 2002, p. 34-43. COMPARATIVE STUDY OF EXTRACTION ASSOCIATIVE RULES ALGORITHM Itemsets Extracts Strategy of association rules extraction from initial context Test-andGenerat Divide-andGenerate Data Structure Hybrid Database structure Structure of Itemsets storage FI X - hash-tree m FI X Tidsets hash-tree Partition O(2 P)* FI X - hash-tree Dic O(2mM)** FI X - Trie FI X Tidsets Sparse Matrix X AprioriTid O(mn2 ) m 2 Elat 2 O(n m ) 3 FP_growth O(mn + m ) Close O(2mn2m) A-Close FP-tree hash table FCI X - Trie m 2 FCI X - Trie m 2 X - Trie FP-tree Trie IT-Tree Trie - Trie FP-tree hash table O(2 n m) FI Titanic O(2 n m) FCI Closet O(mn + m4) FCI m 2 Charm O(2 +n ) FCI Prince m O(2 ) FCI Gr-Growth O(mn+m4) FCI X X X * P: number of partition. **M: number of transactions X

RELATED PAPERS

RELATED TOPICS

Log In

Algorithms of association rules extraction: State of the art

Algorithms of association rules extraction: State of the art