Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Itemset mining: A constraint programming perspective

Published: 01 August 2011 Publication History

Abstract

The field of data mining has become accustomed to specifying constraints on patterns of interest. A large number of systems and techniques has been developed for solving such constraint-based mining problems, especially for mining itemsets. The approach taken in the field of data mining contrasts with the constraint programming principles developed within the artificial intelligence community. While most data mining research focuses on algorithmic issues and aims at developing highly optimized and scalable implementations that are tailored towards specific tasks, constraint programming employs a more declarative approach. The emphasis lies on developing high-level modeling languages and general solvers that specify what the problem is, rather than outlining how a solution should be computed, yet are powerful enough to be used across a wide variety of applications and application domains. This paper contributes a declarative constraint programming approach to data mining. More specifically, we show that it is possible to employ off-the-shelf constraint programming techniques for modeling and solving a wide variety of constraint-based itemset mining tasks, such as frequent, closed, discriminative, and cost-based itemset mining. In particular, we develop a basic constraint programming model for specifying frequent itemsets and show that this model can easily be extended to realize the other settings. This contrasts with typical procedural data mining systems where the underlying procedures need to be modified in order to accommodate new types of constraint, or novel combinations thereof. Even though the performance of state-of-the-art data mining systems outperforms that of the constraint programming approach on some standard tasks, we also show that there exist problems where the constraint programming approach leads to significant performance improvements over state-of-the-art methods in data mining and as well as to new insights into the underlying data mining problems. Many such insights can be obtained by relating the underlying search algorithms of data mining and constraint programming systems to one another. We discuss a number of interesting new research questions and challenges raised by the declarative constraint programming approach to data mining.

References

[1]
Agrawal, Rakesh, Imielinski, Tomasz and Swami, Arun N., Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, ACM Press. pp. 207-216.
[2]
Agrawal, Rakesh, Mannila, Hiekki, Srikant, Ramakrishnan, Toivonen, Hannu and Inkeri Verkamo, A., Fast discovery of association rules. In: Advances in Knowledge Discovery and Data Mining, AAAI Press. pp. 307-328.
[3]
Apt, Krzysztof R. and Wallace, Mark, Constraint Logic Programming Using Eclipse. 2007. Cambridge University Press, New York, NY, USA.
[4]
Bay, Stephen D. and Pazzani, Michael J., Detecting change in categorical data: mining contrast sets. In: Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining, ACM Press. pp. 302-306.
[5]
Bayardo Jr., Roberto J., Agrawal, Rakesh and Gunopulos, Dimitrios, Constraint-based rule mining in large, dense databases. Data Mining and Knowledge Discovery. v4 i2/3. 217-240.
[6]
Beldiceanu, Nicolas, Carlsson, Mats, Demassey, Sophie and Petit, Thierry, Global constraint catalogue: past, present and future. Constraints. v12. 21-62.
[7]
Bessiere, Christian, Hebrard, Emmanuel and O'Sullivan, Barry, Minimising decision tree size as combinatorial optimisation. In: Lecture Notes in Computer Science, vol. 5732. Springer. pp. 173-187.
[8]
Bonchi, Francesco and Goethals, Bart, FP-bonsai: the art of growing and pruning small fp-trees. In: Lecture Notes in Computer Science, vol. 3056. Springer. pp. 155-160.
[9]
Bonchi, Francesco and Lucchese, Claudio, Extending the state-of-the-art of constraint-based pattern discovery. Data and Knowledge Engineering. v60 i2. 377-399.
[10]
Brailsford, Sally C., Potts, Chris N. and Smith, Barbara M., Constraint satisfaction problems: algorithms and applications. European Journal of Operational Research. v119 i3. 557-581.
[11]
Bucila, Cristian, Gehrke, Johannes, Kifer, Daniel and White, Walker M., DualMiner: a dual-pruning algorithm for itemsets with constraints. Data Mining and Knowledge Discovery. v7 i3. 241-272.
[12]
Chang, Ming-Wei, Ratinov, Lev-Arie, Rizzolo, Nicholas and Roth, Dan, Learning and inference with constraints. In: Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, AAAI Press. pp. 1513-1518.
[13]
Cheng, Hong, Xifeng, Yan, Han, Jiawei and Hsu, Chih-Wei, Discriminative frequent pattern analysis for effective classification. In: Proceedings of the 23rd International Conference on Data Engineering, IEEE. pp. 716-725.
[14]
Cheng, Hong, Xifeng, Yan, Han, Jiawei and Yu, P.S., Direct discriminative pattern mining for effective classification. In: Proceedings of the 24th International Conference on Data Engineering, IEEE. pp. 169-178.
[15]
Cussens, James, Bayesian network learning by compiling to weighted max-sat. In: Proceedings of the 24th Conference in Uncertainty in Artificial Intelligence, AUAI Press. pp. 105-112.
[16]
De Raedt, Luc, Guns, Tias and Nijssen, Siegfried, Constraint programming for itemset mining. In: Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM. pp. 204-212.
[17]
De Raedt, Luc, Guns, Tias and Nijssen, Siegfried, Constraint programming for data mining and machine learning. In: Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, AAAI Press. pp. 1513-1518.
[18]
De Raedt, Luc and Kramer, Stefan, The levelwise version space algorithm and its application to molecular fragment finding. In: Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, Morgan Kaufmann. pp. 853-862.
[19]
De Raedt, Luc and Zimmermann, Albrecht, Constraint-based pattern set mining. In: Proceedings of the Seventh SIAM International Conference on Data Mining, SIAM. pp. 1-12.
[20]
Dong, Guozhu and Li, Jinyan, Efficient mining of emerging patterns: discovering trends and differences. In: Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining, ACM Press. pp. 43-52.
[21]
Fan, Wei, Zhang, Kun, Cheng, Hong, Gao, Jing, Xifeng, Yan, Han, Jiawei, Yu, Philip S. and Verscheure, Olivier, Direct mining of discriminative and essential frequent patterns via model-based search tree. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM. pp. 230-238.
[22]
Frisch, Alan M., Grum, Matthew, Jefferson, Christopher, Martínez Hernández, Bernadette and Miguel, Ian, The design of essence: a constraint language for specifying combinatorial problems. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, Morgan Kaufmann. pp. 80-87.
[23]
Fürnkranz, Johannes and Flach, Peter A., ROC 'n' rule learning - towards a better understanding of covering algorithms. Machine Learning. v58 i1. 39-77.
[24]
. In: Ganter, Bernhard, Stumme, Gerd, Wille, Rudolf (Eds.), Lecture Notes in Computer Science, vol. 3626. Springer.
[25]
http://www.gecode.org
[26]
Gent, Ian P., Jefferson, Christopher and Miguel, Ian, MINION: a fast scalable constraint solver. In: Proceeding of the 17th European Conference on Artificial Intelligence, IOS Press. pp. 98-102.
[27]
Bart Goethals, Mohammed J. Zaki, Advances in frequent itemset mining implementations: report on FIMI¿03, in: SIGKDD Explorations Newsletter, vol. 6, 2004, pp. 109-117.
[28]
Grosskreutz, Henrik, Rüping, Stefan and Wrobel, Stefan, Tight optimistic estimates for fast subgroup discovery. In: Lecture Notes in Computer Science, vol. 5211. Springer. pp. 440-456.
[29]
Tias Guns, Siegfried Nijssen, Luc De Raedt, k-Pattern set mining under constraints, CW Reports CW596, Department of Computer Science, K.U. Leuven, October 2010.
[30]
Han, J., Pei, J. and Yin, Y., Mining frequent patterns without candidate generation. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, ACM Press. pp. 1-12.
[31]
Han, Jiawei, Cheng, Hong, Xin, Dong and Yan, Xifeng, Frequent pattern mining: current status and future directions. Data Mining and Knowledge Discovery. v15 i1. 55-86.
[32]
Kavsek, Branko, Lavrac, Nada and Jovanoski, Viktor, APRIORI-SD: adapting association rule learning to subgroup discovery. In: Lecture Notes in Computer Science, vol. 2810. Springer. pp. 230-241.
[33]
Mannila, Heikki and Toivonen, Hannu, Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery. v1 i3. 241-258.
[34]
Morimoto, Yasuhiko, Fukuda, Takeshi, Matsuzawa, Hirofumi, Tokuyama, Takeshi and Yoda, Kunikazu, Algorithms for mining association rules for binary segmentations of huge categorical databases. In: Proceedings of 24rd International Conference on Very Large Data Bases, Morgan Kaufmann. pp. 380-391.
[35]
Morishita, Shinichi and Sese, Jun, Traversing itemset lattice with statistical metric pruning. In: Proceedings of the Nineteenth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, ACM. pp. 226-236.
[36]
Nijssen, Siegfried and Fromont, ílisa, Optimal constraint-based decision tree induction from itemset lattices. Data Mining and Knowledge Discovery. v21 i1. 9-51.
[37]
Nijssen, Siegfried and Guns, Tias, Integrating constraint programming and itemset mining. In: Lecture Notes in Computer Science, vol. 6322. Springer. pp. 467-482.
[38]
Nijssen, Siegfried, Guns, Tias and De Raedt, Luc, Correlated itemset mining in ROC space: a constraint programming approach. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM. pp. 647-656.
[39]
Nijssen, Siegfried and Kok, Joost N., Multi-class correlated pattern mining. In: Lecture Notes in Computer Science, vol. 3933. Springer. pp. 165-187.
[40]
Pasquier, Nicolas, Bastide, Yves, Taouil, Rafik and Lakhal, Lotfi, Discovering frequent closed itemsets for association rules. In: Lecture Notes in Computer Science, vol. 1540. Springer. pp. 398-416.
[41]
Pei, Jian and Han, Jiawei, Can we push more constraints into frequent pattern mining?. In: Proceedings of the sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM. pp. 350-354.
[42]
Pei, Jian, Han, Jiawei and Lakshmanan, Laks V.S., Mining frequent item sets with convertible constraints. In: Proceedings of the IEEE International Conference on Data Engineering, IEEE. pp. 433-442.
[43]
Pei, Jian, Han, Jiawei and Mao, Runying, Closet: an efficient algorithm for mining frequent closed itemsets. In: ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, ACM. pp. 21-30.
[44]
Perron, Laurent, Search procedures and parallelism in constraint programming. In: Lecture Notes in Computer Science, vol. 1713. Springer. pp. 346-360.
[45]
Rossi, Francesca, van Beek, Peter and Walsh, Toby, Handbook of Constraint Programming (Foundations of Artificial Intelligence). 2006. Elsevier Science Inc.
[46]
Schulte, Christian, Programming Constraint Services: High-Level Programming of Standard and New Constraint Services. 2002. Lecture Notes in Computer Science, 2002.Springer.
[47]
Schulte, Christian and Stuckey, Peter J., Efficient constraint propagation engines. Transactions on Programming Languages and Systems. v31 i1. 1-43.
[48]
Sese, Jun and Morishita, Shinichi, Answering the most correlated n association rules efficiently. In: Lecture Notes in Computer Science, vol. 2431. Springer. pp. 410-422.
[49]
Shenoy, Pradeep, Haritsa, Jayant R., Sudarshan, S., Bhalotia, Gaurav, Bawa, Mayank and Devavrat, Shah, Turbo-charging vertical mining of large databases. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, ACM. pp. 22-33.
[50]
Soulet, Arnaud and Crømilleux, Bruno, An efficient framework for mining flexible constraints. In: Lecture Notes in Computer Science, vol. 3518. Springer. pp. 43-64.
[51]
Uno, Takeaki, Kiyomi, Masashi and Arimura, Hiroki, LCM ver.3: collaboration of array, bitmap and prefix tree for frequent itemset mining. In: Proceedings of the 1st International Workshop on Open Source Data Mining, ACM. pp. 77-86.
[52]
Van Hentenryck, Pascal and Deville, Yves, . In: The Cardinality Operator: A New Logical Connective for Constraint Logic Programming, MIT Press, Cambridge, MA, USA. pp. 383-403.
[53]
Van Hentenryck, Pascal, Perron, Laurent and Puget, Jean-Francois, Search and strategies in OPL. ACM Transations Computational Logic. v1 i2. 285-320.
[54]
Van Hentenryck, Pascal, Saraswat, Vijay A. and Deville, Yves, Design, implementation, and evaluation of the constraint language cc(FD). Journal of Logic Programming. v37 i1-3. 139-164.
[55]
Wrobel, Stefan, An algorithm for multi-relational discovery of subgroups. In: Lecture Notes in Computer Science, vol. 1263. Springer. pp. 78-87.
[56]
Javeed Zaki, Mohammed and Gouda, Karam, Fast vertical mining using diffsets. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM. pp. 326-335.

Cited By

View all
  • (2023)Explanations for Itemset Mining by Constraint Programming: A Case Study Using ChEMBL DataAdvances in Intelligent Data Analysis XXI10.1007/978-3-031-30047-9_17(208-221)Online publication date: 12-Apr-2023
  • (2022)A Declarative Framework for Maximal k-plex Enumeration ProblemsProceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems10.5555/3535850.3535925(660-668)Online publication date: 9-May-2022
  • (2022)Towards Revenue Maximization with Popular and Profitable ProductsACM/IMS Transactions on Data Science10.1145/34880582:4(1-21)Online publication date: 24-May-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Artificial Intelligence
Artificial Intelligence  Volume 175, Issue 12-13
August, 2011
108 pages

Publisher

Elsevier Science Publishers Ltd.

United Kingdom

Publication History

Published: 01 August 2011

Author Tags

  1. Constraint programming
  2. Data mining
  3. Itemset mining

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Explanations for Itemset Mining by Constraint Programming: A Case Study Using ChEMBL DataAdvances in Intelligent Data Analysis XXI10.1007/978-3-031-30047-9_17(208-221)Online publication date: 12-Apr-2023
  • (2022)A Declarative Framework for Maximal k-plex Enumeration ProblemsProceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems10.5555/3535850.3535925(660-668)Online publication date: 9-May-2022
  • (2022)Towards Revenue Maximization with Popular and Profitable ProductsACM/IMS Transactions on Data Science10.1145/34880582:4(1-21)Online publication date: 24-May-2022
  • (2022)An efficient heuristic approach combining maximal itemsets and area measure for compressing voluminous table constraintsThe Journal of Supercomputing10.1007/s11227-022-04667-179:1(650-676)Online publication date: 14-Jul-2022
  • (2022)The minimum description length principle for pattern mining: a surveyData Mining and Knowledge Discovery10.1007/s10618-022-00846-z36:5(1679-1727)Online publication date: 1-Sep-2022
  • (2021)Towards a Compact SAT-Based Encoding of Itemset Mining TasksIntegration of Constraint Programming, Artificial Intelligence, and Operations Research10.1007/978-3-030-78230-6_11(163-178)Online publication date: 5-Jul-2021
  • (2020)Mining the Local Dependency Itemset in a Products NetworkACM Transactions on Management Information Systems10.1145/338447311:1(1-31)Online publication date: 17-Apr-2020
  • (2020)SAT-based models for overlapping community detection in networksComputing10.1007/s00607-020-00803-y102:5(1275-1299)Online publication date: 1-May-2020
  • (2020)A SAT-Based Approach for Mining High Utility Itemsets from Transaction DatabasesBig Data Analytics and Knowledge Discovery10.1007/978-3-030-59065-9_8(91-106)Online publication date: 14-Sep-2020
  • (2020)SAT‐based and CP‐based declarative approaches for Top‐Rank‐K closed frequent itemset miningInternational Journal of Intelligent Systems10.1002/int.2229436:1(112-151)Online publication date: 2-Dec-2020
  • Show More Cited By

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media