Abstract
Condensed representations have been studied extensively for 15 years. In particular, the maximal patterns of the equivalence classes have received much attention with very general proposals. In contrast, the minimal patterns remained in the shadows in particular because they are too numerous and they are difficult to extract. In this paper, we present a generic framework for exact and approximate minimal patterns mining by introducing the concept of minimizable set system. This framework based on set systems addresses various languages such as itemsets or strings, and at the same time, different metrics such as frequency. For instance, the free, \(\delta \)-free and the essential patterns are naturally handled by our approach, just as the minimal strings. Then, for any minimizable set system, we introduce a fast minimality checking method that is easy to incorporate in a depth-first search algorithm for mining the \(\delta \)-minimal patterns. We demonstrate that it is polynomial-delay and polynomial-space. Experiments on traditional benchmarks complete our study by showing that our approach is competitive with the best proposals.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
We use the notation Xe instead of \(X \cup \{e\}\).
- 2.
As this prototype mines non-derivable itemsets, it enable us to compute free patterns when the depth parameter is set to 1.
- 3.
- 4.
This dataset is provided with \({{\textsc {maxMotif}}}\): http://research.nii.ac.jp/~uno/codes.htm.
References
Arimura, H., & Uno, T. (2009). Polynomial-delay and polynomial-space algorithms for mining closed sequences, graphs, and pictures in accessible set systems. In SDM (pp. 1087–1098). SIAM.
Boulicaut, J.-F., Bykowski, A., & Rigotti, C. (2000). Approximation of frequency queries by means of free-sets. In D. A. Zighed, J. Komorowski & J. Żytkow (Eds.), PKDD. LNCS (Vol. 1910, pp. 75–85). Heidelberg: Springer.
Boulicaut, J.-F., Bykowski, A., & Rigotti, C. (2003). Free-sets: A condensed representation of boolean data for the approximation of frequency queries. Data Mining and Knowledge Discovery, 7(1), 5–22.
Calders, T., & Goethals, B. (2003). Minimal k-free representations of frequent sets. In Proceedings of the 7th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD 2003) (pp. 71–82). Heidelberg: Springer.
Calders, T., & Goethals, B. (2005). Depth-first non-derivable itemset mining. In SDM (pp. 250–261).
Calders, T., Rigotti, C., & Boulicaut, J. F. (2004). A survey on condensed representations for frequent sets. In J.-F. Boulicaut, L. De Raedt, & H. Mannila (Eds.), Constraint-based mining and inductive databases. Lecture notes in computer science (Vol. 3848, pp. 64–80). Heidelberg: Springer.
Casali, A., Cicchetti, R., & Lakhal, L. (2005). Essential patterns: A perfect cover of frequent patterns. In A. M. Tjoa & J. Trujillo (Eds.), DaWaK. Lecture notes in computer science (Vol. 3589, pp. 428–437). Heidelberg: Springer.
Crémilleux, B., & Boulicaut, J.-F. (2003). Simplest rules characterizing classes generated by \(\delta \)-free sets. In M. Bramer, A. Preece, & F. Coenen (Eds.), Research and development in intelligent systems XIX (pp. 33–46). London: Springer.
Eiter, T., & Gottlob, G. (2002). Hypergraph transversal computation and related problems in logic and AI. In S. Flesca, S. Greco, G. Ianni, & N. Leone (Eds.), JELIA. Lecture notes in computer science (Vol. 2424, pp. 549–564). Heidelberg: Springer.
Gao, C., Wang, J., He, Y., & Zhou, L. (2008). Efficient mining of frequent sequence generators. In WWW (pp. 1051–1052). ACM.
Gasmi, G., Yahia, S. B., Nguifo, E. M., & Bouker, S. (2007). Extraction of association rules based on literalsets. In Y. Song, J. Eder, & T. M. Nguyen (Eds.), DaWaK. Lecture notes in computer science (Vol. 4654, pp. 293–302). Heidelberg: Springer.
Giacometti, A., Li, D. H., Marcel, P., & Soulet, A. (2013). 20 years of pattern mining: a bibliometric survey. SIGKDD Explorations, 15(1), 41–50.
Hamrouni, T. (2012). Key roles of closed sets and minimal generators in concise representations of frequent patterns. Intelligent Data Analysis, 16(4), 581–631.
Hébert, C., & Crémilleux, B. (2005). Mining frequent delta-free patterns in large databases. In A. Hoffmann, H. Motoda, & T. Scheffer (Eds.), Discovery science. Lecture notes in computer science (Vol. 3735, pp. 124–136). Heidelberg: Springer.
Jelassi, M. N., Largeron, C., & Yahia, S. B. (2014). Efficient unveiling of multi-members in a social network. Journal of Systems and Software, 94, 30–38.
Kryszkiewicz, M. (2005). Generalized disjunction-free representation of frequent patterns with negation. Journal of Experimental and Theoretical Artificial Intelligence, 17(1–2), 63–82.
Li, J., Li, H., Wong, L., Pei, J. & Dong, G. (2006). Minimum description length principle: Generators are preferable to closed patterns. In AAAI (pp. 409–414).
Liu, B., Hsu, W. & Ma, Y. (1998). Integrating classification and association rule mining. In KDD (pp. 80–86).
Liu, G., Li, J., & Wong, L. (2008). A new concise representation of frequent itemsets using generators and a positive border. Knowledge and Information Systems, 17(1), 35–56.
Lo, D., Khoo, S. -C., & Li, J. (2008). Mining and ranking generators of sequential patterns. In SDM (pp. 553–564). SIAM.
Lo, D., Khoo, S.-C., & Wong, L. (2009). Non-redundant sequential rules-theory and algorithm. Information Systems, 34(4–5), 438–453.
Mannila, H. & Toivonen, H. (1996). Multiple uses of frequent sets and condensed representations (extended abstract). In E. Simoudis, J. Han & U. M. Fayyad (Eds.), Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), Portland, Oregon, USA (pp. 189–194). AAAI Press.
Murakami, K. & Uno, T. (2013). Efficient algorithms for dualizing large-scale hypergraphs. In ALENEX (pp. 1–13).
Pasquier, N., Bastide, Y., Taouil, R., & Lakhal, L. (1999). Efficient mining of association rules using closed itemset lattices. Information Systems, 24(1), 25–46.
Rioult, F., Zanuttini, B., & Crémilleux, B. (2010). Nonredundant generalized rules and their impact in classification. In Z. W. Ras & L.-S. Tsay (Eds.), Advances in intelligent information systems. Studies in computational intelligence (Vol. 265, pp. 3–25). Heidelberg: Springer.
Soulet, A., & Crémilleux, B. (2008). Adequate condensed representations of patterns. Data Mining and Knowledge Discovery, 17(1), 94–110.
Soulet, A., Crémilleux, B., & Rioult, F. (2004). Condensed representation of EPs and patterns quantified by frequency-based measures. In Post-proceedings of knowledge discovery in inductive databases, pise. Heidelberg: Springer.
Soulet, A., & Rioult, F. (2014). Efficiently depth-first minimal pattern mining. In V. S. Tseng., T. B. Ho., Z. Zhou., A. L. P. Chen., & H. Kao (Eds.), Proceedings 18th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2014, Part I, Tainan, Taiwan, May 13–16, 2014. Lecture notes in computer science (Vol. 8443, pp. 28–39). Heidelberg: Springer.
Szathmary, L., Valtchev, P., Napoli, A., & Godin, R. (2009). Efficient vertical mining of frequent closures and generators. In IDA. LNCS (Vol. 5772, pp. 393–404). Heidelberg: Springer.
Zaki, M.J. (2000). Generating non-redundant association rules. In KDD (pp. 34–43).
Zeng, Z., Wang, J., Zhang, J., & Zhou, L. (2009). FOGGER: an algorithm for graph generator discovery. In EDBT (pp. 517–528).
Acknowledgments
This article has been partially funded by the Hybride project (ANR-11-BS02-0002).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Soulet, A., Rioult, F. (2017). Exact and Approximate Minimal Pattern Mining. In: Guillet, F., Pinaud, B., Venturini, G. (eds) Advances in Knowledge Discovery and Management. Studies in Computational Intelligence, vol 665. Springer, Cham. https://doi.org/10.1007/978-3-319-45763-5_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-45763-5_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45762-8
Online ISBN: 978-3-319-45763-5
eBook Packages: EngineeringEngineering (R0)