Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Closed patterns meet n-ary relations

Published: 23 March 2009 Publication History

Abstract

Set pattern discovery from binary relations has been extensively studied during the last decade. In particular, many complete and efficient algorithms for frequent closed set mining are now available. Generalizing such a task to n-ary relations (n ≥ 2) appears as a timely challenge. It may be important for many applications, for example, when adding the time dimension to the popular objects × features binary case. The generality of the task (no assumption being made on the relation arity or on the size of its attribute domains) makes it computationally challenging. We introduce an algorithm called Data-Peeler. From an n-ary relation, it extracts all closed n-sets satisfying given piecewise (anti) monotonic constraints. This new class of constraints generalizes both monotonic and antimonotonic constraints. Considering the special case of ternary relations, Data-Peeler outperforms the state-of-the-art algorithms CubeMiner and Trias by orders of magnitude. These good performances must be granted to a new clever enumeration strategy allowing to efficiently enforce the closeness property. The relevance of the extracted closed n-sets is assessed on real-life 3-and 4-ary relations. Beyond natural 3-or 4-ary relations, expanding a relation with an additional attribute can help in enforcing rather abstract constraints such as the robustness with respect to binarization. Furthermore, a collection of closed n-sets is shown to be an excellent starting point to compute a tiling of the dataset.

References

[1]
Afrati, F., Das, G., Gionis, A., Mannila, H., Mielikainen, T., and Tsaparas, P. 2005. Mining chains of relations. In Proceedings of the 5th IEEE International Conference on Data Mining (ICDM'05). IEEE Computer Society, 553--556.
[2]
Agrawal, R., Imielinski, T., and Swami, A. N. 1993. Mining association rules between sets of items in large databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'93). ACM Press, 207--216.
[3]
Agrawal, R. and Srikant, R. 1994. Fast algorithms for mining association rules in large databases. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB'94). Morgan Kaufmann, 487--499. Introduction to the Quest data generator.
[4]
Besson, J., Robardet, C., Boulicaut, J.-F., and Rome, S. 2005. Constraint-based formal concept mining and its application to microarray data analysis. Intell. Data Anal. 9, 1, 59--82.
[5]
Boulicaut, J.-F. and Jeudy, B. 2005. Constraint-Based data mining. In The Data Mining and Knowledge Discovery Handbook, O. Maimon and L. Rokach, Eds. Springer, 399--416.
[6]
Brayton, R. K., Sangiovanni-Vincentelli, A. L., McMullen, C. T., and Hachtel, G. D. 1984. Logic Minimization Algorithms for VLSI Synthesis. Kluwer Academic, Norwell, MA.
[7]
Cerf, L., Besson, J., Robardet, C., and Boulicaut, J.-F. 2008. Data-Peeler: Constraint-Based closed pattern mining in n-ary relations. In Proceedings of the 8th SIAM International Conference on Data Mining (SDM'08). SIAM.
[8]
McCluskey, J. 1956. Minimization of Boolean functions. Bell Syst. Tech. J. 35, 5, 1417--1444.
[9]
Gao, M., Jiang, J.-H., Jiang, Y., Li, Y., Sinha, S., and Brayton, R. 2000. MVSIS. In Notes of the IEEE International Workshop on Logic Synthesis. IEEE Computer Society.
[10]
Garriga, G. C., Khardon, R., and Raedt, L. D. 2007. On mining closed sets in multi-relational data. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI'07). AAAI Press, 804--809.
[11]
Gély, A. 2005. A generic algorithm for generating closed sets of a binary relation. In Proceedings of the 3rd International Conference on Formal Concept Analysis (ICFCA'05). Lecture Notes in Computer Science, Vol, 3403, Springer, 223--234.
[12]
Goethals, B. and Zaki, M. J. 2004. Advances in frequent itemset mining implementations: Report on FIMI'03. ACM SIGKDD Explor. Newslett. 6, 1, 109--117.
[13]
Grahne, G. and Zhu, J. 2005. Fast algorithms for frequent itemset mining using FP-trees. IEEE Trans. Knowl. Data Eng. 17, 10, 1347--1362.
[14]
Han, J., Pei, J., and Yin, Y. 2000. Mining frequent patterns without candidate generation. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'00). ACM Press, 1--12.
[15]
Jaschke, R., Hotho, A., Schmitz, C., Ganter, B., and Stumme, G. 2006. Trias--An algorithm for mining iceberg tri-lattices. In Proceedings of the 6th IEEE International Conference on Data Mining (ICDM'06). IEEE Computer Society, 907--911.
[16]
Ji, L., Tan, K.-L., and Tung, A. K. H. 2006. Mining frequent closed cubes in 3D data sets. In Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB'06). VLDB Endowment, 811--822.
[17]
Jiang, D., Pei, J., Ramanathan, M., Tang, C., and Zhang, A. 2004. Mining coherent gene clusters from gene-sample-time microarray data. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'04). ACM Press, 430--439.
[18]
Karnaugh, M. 1953. The map method for synthesis of combinational logic circuits. Trans. Amer. Institute Electric. Eng. Part I 72, 9, 593--599.
[19]
Ng, R. T., Lakshmanan, L. V. S., Han, J., and Pang, A. 1998. Exploratory mining and pruning optimizations of constrained associations rules. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'98). ACM Press, 13--24.
[20]
Pan, F., Cong, G., Tung, A. K., Yang, J., and Zaki, M. J. 2003. CARPENTER: Finding closed patterns in long biological datasets. In Proceedings of the ACM SIGKDD'03. ACM Press, 637--642.
[21]
Pasquier, N., Bastide, Y., Taouil, R., and Lakhal, L. 1999. Efficient mining of association rules using closed itemset lattices. Inf. Syst. 24, 1 (Jan.), 25--46.
[22]
Pei, J., Han, J., and Lakshmanan, L. V. S. 2001. Mining frequent item sets with convertible constraints. In Proceedings of the 17th International Conference on Data Engineering (ICDE'01). ACM Press, 433--442.
[23]
Pei, J., Han, J., and Mao, R. 2000. CLOSET: An efficient algorithm for mining frequent closed itemsets. In Workshop on Research Issues in Data Mining and Knowledge Discovery (SIGMOD'00). ACM Press, 21--30.
[24]
Pensa, R. G. and Boulicaut, J.-F. 2005. Boolean property encoding for local set pattern discovery: An application to gene expression data analysis. In Local Pattern Detection. Vol. 3539. Springer, 115--134.
[25]
Rudell, R. and Sangiovanni-Vincentelli, A. 1985. Espresso-MV: Algorithms for multiple valued logic minimization. In Proceedings of the IEEE Custom International Circuit Conference. IEEE Computer Society, 230--234.
[26]
Stumme, G., Taouil, R., Bastide, Y., Pasquier, N., and Lakhal, L. 2002. Computing iceberg concept lattices with titanic. Data Knowl. Eng. 42, 189--222.
[27]
Sun, J., Tao, D., and Faloutsos, C. 2006. Beyond streams and graphs: Dynamic tensor analysis. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'06). ACM Press, 374--383.
[28]
Uno, T., Kiyomi, M., and Arimura, H. 2005. LCM ver.3: Collaboration of array, bitmap and prefix tree for frequent itemset mining. In Proceedings of the 1st ACM International Workshop on Open Source Data Mining (OSDM'05). ACM Press, 77--86.
[29]
Wang, J., Han, J., and Pei, J. 2003. CLOSET+: Searching for the best strategies for mining frequent closed itemsets. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'03). ACM Press, 236--245.
[30]
Wille, R. 1982. Restructuring lattice theory: An approach based on hierarchies of concepts. In Ordered Sets, I. Rival, Ed. Reidel, 445--470.
[31]
Zaki, M. J. and Hsiao, C. J. 2002. ChARM: An efficient algorithm for closed itemset mining. In Proceedings of the 2nd SIAM International Conference on Data Mining (SDM'02). SIAM.
[32]
Zhao, L. and Zaki, M. J. 2005. TriCluster: An effective algorithm for mining coherent clusters in 3D microarray data. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'05). ACM Press, 694--705.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data
ACM Transactions on Knowledge Discovery from Data  Volume 3, Issue 1
March 2009
251 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/1497577
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 March 2009
Accepted: 01 November 2008
Revised: 01 October 2008
Received: 01 April 2008
Published in TKDD Volume 3, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. n-ary relations
  2. Closed patterns
  3. constraint properties
  4. constraint-based mining
  5. tiling

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)1
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Co-clustering: A Survey of the Main Methods, Recent Trends, and Open ProblemsACM Computing Surveys10.1145/369887557:2(1-33)Online publication date: 4-Oct-2024
  • (2024)On GNN explainability with activation rulesData Mining and Knowledge Discovery10.1007/s10618-022-00870-z38:5(3227-3261)Online publication date: 1-Sep-2024
  • (2023)Constrained regret minimization for multi-criterion multi-armed banditsMachine Language10.1007/s10994-022-06291-9112:2(431-458)Online publication date: 6-Jan-2023
  • (2023)Efficient learning of large sets of locally optimal classification rulesMachine Language10.1007/s10994-022-06290-w112:2(571-610)Online publication date: 23-Jan-2023
  • (2022)A Novel Framework for Unification of Association Rule Mining, Online Analytical Processing and Statistical ReasoningIEEE Access10.1109/ACCESS.2022.3142537(1-1)Online publication date: 2022
  • (2022)Learning multi-agent coordination through connectivity-driven communicationMachine Language10.1007/s10994-022-06286-6112:2(483-514)Online publication date: 29-Dec-2022
  • (2022)Computing triadic generators and association rules from triadic contextsAnnals of Mathematics and Artificial Intelligence10.1007/s10472-022-09784-490:11-12(1083-1105)Online publication date: 1-Dec-2022
  • (2022)A Lossless Data Reduction for Mining Constrained Patterns in n-ary RelationsMachine Learning and Knowledge Discovery in Databases10.1007/978-3-662-44851-9_37(581-596)Online publication date: 10-Mar-2022
  • (2022)On the Pareto-Optimal Solutions in the Multimodal Clustering ProblemRecent Trends in Analysis of Images, Social Networks and Texts10.1007/978-3-031-15168-2_15(179-194)Online publication date: 30-Aug-2022
  • (2022)Detecting Communities in Complex Networks Using Formal Concept AnalysisAdvances in Knowledge Discovery and Management10.1007/978-3-030-90287-2_5(77-105)Online publication date: 15-Mar-2022
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media