Finding closed frequent item sets by intersecting transactions

C Borgelt, X Yang, R Nogales-Cadenas… - Proceedings of the 14th …, 2011 - dl.acm.org
Proceedings of the 14th International Conference on Extending Database …, 2011dl.acm.org
Most known frequent item set mining algorithms work by enumerating candidate item sets
and pruning infrequent candidates. An alternative method, which works by intersecting
transactions, is much less researched. To the best of our knowledge, there are only two
basic algorithms: a cumulative scheme, which is based on a repository with which new
transactions are intersected, and the Carpenter algorithm, which enumerates and intersects
candidate transaction sets. These approaches yield the set of so-called closed frequent item …
Most known frequent item set mining algorithms work by enumerating candidate item sets and pruning infrequent candidates. An alternative method, which works by intersecting transactions, is much less researched. To the best of our knowledge, there are only two basic algorithms: a cumulative scheme, which is based on a repository with which new transactions are intersected, and the Carpenter algorithm, which enumerates and intersects candidate transaction sets. These approaches yield the set of so-called closed frequent item sets, since any such item set can be represented as the intersection of some subset of the given transactions. In this paper we describe a considerably improved implementation scheme of the cumulative approach, which relies on a prefix tree representation of the already found intersections. In addition, we present an improved way of implementing the Carpenter algorithm. We demonstrate that on specific data sets, which occur particularly often in the area of gene expression analysis, our implementations significantly outperform enumeration approaches to frequent item set mining.
ACM Digital Library