Abstract
This paper provides an overview of the current state-of-the-art on using constraints in knowledge discovery and data mining. The use of constraints in a data mining task requires specific definition and satisfaction tools during knowledge extraction. This survey proposes three groups of studies based on classification, clustering and pattern mining, whether the constraints are on the data, the models or the measures, respectively. We consider the distinctions between hard and soft constraint satisfaction, and between the knowledge extraction phases where constraints are considered. In addition to discussing how constraints can be used in data mining, we show how constraint-based languages can be used throughout the data mining process.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
The basic idea of support vector machine is to represent the decision boundary using a subset of the training examples, known as support vectors. Given a set of hyperplanes, the classifier selects one hyperplane for representing its decision boundary, based on how well they are expected to perform on the example to classify.
The Euclidean distance, i.e. straight-line distance is an ineffective measure in the presence of obstacles and facilitators. An obstacle is a physical object that obstructs reachability among the individuals. On the contrary, a facilitator is a physical object that enhances reachability among people.
MiniZinc is a constraint-modeling language. It is sufficiently high-level to express most constraint problems easily, but low-level enough to be mapped onto existing solvers easily and consistently (Nethercote et al. 2007). The MiningZinc language (Guns et al. 2013b) cited in this section is an extension of MiniZinc for data mining.
Inductive databases extend the closure principle to the knowledge discovery field. The principle simply states that the output of a query for knowledge extraction can be the input of another query of a compatible type (Imielinski and Mannila 1996).
The system automatically captures the properties of such constraints (e.g. monotonicity and anti-monotonicity), to be used directly during the extraction of the mining model.
References
Agrawal R, Srikant R, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD international conference on management of data, pp 207–216
Ahmed CF, Tanbeer SK, Jeong BS, Lee YK, Choi HJ (2012) Single-pass incremental and interactive mining for weighted frequent patterns. Expert Syst Appl 39(9):7976–7994
An A, Stefanowski J, Ramanna S, Butz CJ, Pedrycz W, Wang G (eds) (2007) Rough sets, fuzzy sets, data mining and granular computing. In: Proceedings of the 11th international conference, RSFDGrC 2007, Toronto, Canada, May 14–16, 2007, (Lecture Notes in Computer Science), vol 4482. Springer
Antunes C (2009) Pattern mining over star schemas in the Onto4AR framework. In: Proceedings of the IEEE international conference on data mining (ICDM) workshops, pp 453–458
Antunes C, Oliveira AL (2003) Sequence mining in categorical domains: Incorporating constraints. In: Proceedings of the 3th international conference on machine learning and data mining in pattern recognition (MLDM), pp 239–251
Antunes C, Oliveira A (2004) Constraint relaxations for discovering unknown sequential patterns. In: Proceedings of the third international workshop on knowledge discovery in inductive databases (KDID), pp 11–32
Babaki B, Guns T, Nijssen S (2014) Constrained clustering using column generation. In: Simonis H (ed) Integration of AI and OR techniques in constraint programming: proceedings of the 11th international conference, CPAIOR 2014, Cork, Ireland, May 19–23, 2014. Lecture Notes in Computer Science, vol 8451, pp. 438–454. Springer. doi:10.1007/978-3-319-07046-9_31
Bade K, Nürnberger A (2006) Personalized hierarchical clustering. In: IEEE/ACM international conference on web intelligence (WIC), pp 181–187
Bade K, Nürnberger A (2008) Creating a cluster hierarchy under constraints of a partially known hierarchy. In: Proceedings of the SIAM international conference on data mining (SDM), pp 13–24
Banerjee A, Ghosh J (2006) Scalable clustering algorithms with balancing constraints. Data Min Knowl Discov 13(3):365–395
Banerjee A, Ghosh J (2008) Clustering with balancing constraints. Constrained clustering: advances in algorithms, theory, and applications. Chapman and Hall/CRC, Boca Raton, pp 171–200
Baralis E, Garza P, Quintarelli E, Tanca L (2007) Answering XML queries by means of data summaries. ACM Trans Inf Syst J 25(3):10–16
Baralis E, Cagliero L, Cerquitelli T, Garza P (2012) Generalized association rule mining with constraints. Inf Sci 194:68–84
Baralis E, Cerquitelli T, Chiusano S (2005) Index support for frequent itemset mining in a relational DBMS. In: Proceedings of the 21st international conference on data engineering (ICDE), pp 754–765
Bar-Hillel A, Hertz T, Shental N, Weinshall D (2003) Learning distance functions using equivalence relations. In: Proceedings of the twentieth international conference on machine learning (ICML), pp 11–18
Basu S, Davidson I, Wagstaff KL (2008) Constrained clustering: advances in algorithms, theory, and applications. Chapman and Hall/CRC, Boca Raton
Basu S, Banerjee A, Mooney RJ (2004a) Active semi-supervision for pairwise constrained clustering. In: Proceedings of the Fourth SIAM international conference on data mining (SDM)
Basu S, Bilenko M, Mooney RJ (2004b) A probabilistic framework for semi-supervised clustering. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 59–68
Bellandi A, Furletti B, Grossi V, Romei A (2007) Ontology-driven association rules extraction: a case study. In: Proceedings of the international workshop on contexts and ontologies: representation and reasoning (C&O:RR), pp 1–10
Bellandi A, Furletti B, Grossi V, Romei A (2008) Ontological support for association rule mining. In: Proceedings of the 26th IASTED international conference on artificial intelligence and applications (AIA), AIA ’08. ACTA Press, Anaheim, pp 110–115. http://dl.acm.org/citation.cfm?id=1712759.1712781
Bentayeb F, Darmont J (2002) Decision tree modeling with relational views. In: Proceedings of the 13th international symposium on foundations of intelligent systems (ISMIS), pp 423–431
Bentley JL (1975) Multidimensional binary search trees used for associative searching. Commun ACM 18(9):509–517
Bernhardt J, Chaudhuri S, Fayyad U, Netz A (2001) Integrating data mining with SQL databases: OLE DB for data mining. In: Proceedings of the 17th international conference on data engineering (ICDE), pp 379–387
Bernstein A, Mannor S, Shimkin N (2010) Online classification with specificity constraints. In: Proceedings of the 24th annual conference on neural information processing systems (NIPS), pp 190–198
Bertsekas DP (1991) Linear network optimization: algorithms and codes. MIT Press Cambridge. http://opac.inria.fr/record=b1089011
Besson J, Pensa RG, Robardet C, Boulicaut JF (2006) Knowledge discovery in inductive databases: 4th international workshop, KDID 2005, Porto, Portugal, October 3, 2005, Revised selected and invited papers, chap. Constraint-based mining of fault-tolerant patterns from boolean data. Springer, Berlin Heidelberg, pp 55–71. doi:10.1007/11733492_4
Bilenko M, Basu S, Mooney RJ (2004) Integrating constraints and metric learning in semi-supervised clustering. In: Proceedings of the twenty-first international conference on machine learning (ICML), ICML ’04. ACM, New York, pp. 11–18. doi:10.1145/1015330.1015360
Bistarelli S, Montanari U, Rossi F (1997) Semiring-based constraint satisfaction and optimization. J ACM 44(2):201–236. doi:10.1145/256303.256306
Bistarelli S, Bonchi F (2007) Soft constraint based pattern mining. Data Knowl Eng 62(1):118–137
Blaszczynski J, Deng W, Hu F, Slowinski R, Szelag M, Wang G (2012) On different ways of handling inconsistencies in ordinal classification with monotonicity constraints. In: Greco S, Bouchon-Meunier B, Coletti G, Fedrizzi M, Matarazzo B, Yager RR (eds) Advances on computational intelligence: 14th international conference on information processing and management of uncertainty in knowledge-based systems, IPMU 2012, Catania, Italy, July 9–13, 2012. Proceedings, Part I, communications in computer and information science, vol 297. Springer, pp 300–309. doi:10.1007/978-3-642-31709-5_31
Blaszczynski J, Slowinski R, Szelag M (2010) Probabilistic rough set approaches to ordinal classification with monotonicity constraints. In: Computational intelligence for knowledge-based systems design, 13th international conference on information processing and management of uncertainty, IPMU 2010, pp 99–108
Blockeel H, Calders T, Fromont É, Goethals B, Prado A, Robardet C (2012) An inductive database system based on virtual mining views. Data Min Knowl Discov 24(1):247–287
Blockeel H, Calders T, Fromont É, Goethals B, Prado A (2008a) Mining views: database views for data mining. In: Alonso G, Blakeley JA, Chen ALP (eds) Proceedings of the 24th international conference on data engineering, ICDE 2008, April 7–12, 2008, Cancún, México. IEEE computer society, pp 1608–1611. doi:10.1109/ICDE.2008.4497633
Blockeel H, Calders T, Fromont É, Goethals B, Prado A, Robardet C (2008b) An inductive database prototype based on virtual mining views. In: Li Y, Liu B, Sarawagi S (eds) Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, Las Vegas, NV, August 24–27, 2008. ACM, pp 1061–1064. doi:10.1145/1401890.1402019
Bonchi F, Giannotti F, Mazzanti A, Pedreschi D (2005) ExAnte: a preprocessing method for frequent-pattern mining. IEEE Intell Syst 20(3):25–31
Bonchi F, Giannotti F, Lucchese C, Orlando S, Perego R, Trasarti R (2009) A constraint-based querying system for exploratory pattern discovery. Inf Syst 34(1):3–27
Bonchi F, Lucchese C (2007) Extending the state-of-the-art of constraint-based pattern discovery. Data Knowl Eng 60(2):377–399
Boulicaut J, Jeudy B (2010) Constraint-based data mining. In: Maimon O, Rokach L (eds) Data mining and knowledge discovery handbook, 2nd edn. Springer, New York. doi:10.1007/978-0-387-09823-4_17
Boulicaut JF, Masson C (2005) Data mining query languages. In: Maimom O, Rokach L (eds) The data mining and knowledge discovery handbook. Springer, New York, pp 715–727
Bradley PS, Bennett KP, Demiriz A (2000) Constrained k-means clustering. In: Technical report, MSR-TR-2000-65, Microsoft Research
Brunner C, Fischer A, Luig K, Thies T (2012) Pairwise support vector machines and their application to large scale problems. J Mach Learn Res 13(1): 2279–2292. http://dl.acm.org/citation.cfm?id=2503308.2503316
Bucilă C, Gehrke J, Kifer D, White W (2003) DualMiner: a dual-pruning algorithm for itemsets with constraints. Data Min Knowl Discov 7(3):241–272
Bult JR, Wansbeek TJ (1995) Optimal selection for direct mail. Market Sci 14(4):378–394
Capelle M, Masson C, Boulicaut J (2003) Mining frequent sequential patterns under regular expressions: a highly adaptive strategy for pushing constraints. In: Proceedings of the third SIAM international conference on data mining (SDM), pp 316–320
Cerf L, Besson J, Robardet C, Boulicaut J (2009) Closed patterns meet n-ary relations. ACM Trans Knowl Discov Data (TKDD). doi:10.1145/1497577.1497580
Cerf L, Besson J, Nguyen K, Boulicaut J (2013) Closed and noise-tolerant patterns in n-ary relations. Data Min Knowl Discov 26(3):574–619. doi:10.1007/s10618-012-0284-8
Ceri S, Meo R, Psaila G (1998) An extension to SQL for mining association rules. Data Min Knowl Discov 2(2):195–224. doi:10.1023/A:1009774406717
Chand C, Thakkar A, Ganatra A (2012a) Sequential pattern mining: survey and current research challenges. Int J Soft Comput Eng (IJSCE) 2(1):2231–2307
Chand C, Thakkar A, Ganatra A (2012b) Target oriented sequential pattern mining using recency and monetary constraints. Int J Comput Appl 45(10):12–18
Chang JH (2011) Mining weighted sequential patterns in a sequence database with a time-interval weight. Knowl Based Syst 24(1):1–9
Chen E, Cao H, Li Q, Qian T (2008) Efficient strategies for tough aggregate constraint-based sequential pattern mining. Inf Sci 178(6):1498–1518
Chen YL, Kuo MH, yi Wu S, Tang K (2009) Discovering recency, frequency, and monetary (RFM) sequential patterns from customers’ purchasing data. Electron Commer Res Appl 8(5):241–251
Coleman T, Saunderson J, Wirth A (2008) Spectral clustering with inconsistent advice. In: Proceedings of the twenty-fifth international conference on machine learning (ICML), pp 152–159
Costa JA, Iii AOH (2005) Classification constrained dimensionality reduction. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp 1077–1080
Dao TBH, Duong KC, Vrain C (2013) A declarative framework for constrained clustering. In: Blockeel H, Kersting K, Nijssen S, Zelezn F (eds) ECML/PKDD (3), Lecture Notes in Computer Science, vol 8190. Springer, pp 419–434. doi:10.1007/978-3-642-40994-3
Dao TBH, Duong KC, Vrain C (2015) Constrained minimum sum of squares clustering by constraint programming. In: Proceedings of the 21st international conference on principles and practice of constraint programming (CP 2015). Cork, Ireland, pp 557–573. https://hal.archives-ouvertes.fr/hal-01168193
Davidson I, Ravi SS (2005a) Agglomerative hierarchical clustering with constraints: theoretical and empirical results. In: Knowledge discovery in databases: PKDD 2005, 9th European conference on principles and practice of knowledge discovery in databases (PKDD), pp 59–70
Davidson I, Ravi SS (2005b) Clustering with constraints: feasibility issues and the \(k\)-means algorithm. In: Kargupta H, et al. (eds) Proceedings of the 2005 SIAM international conference on data mining, pp 138–149. doi:10.1137/1.9781611972757.13
Davidson I, Ravi SS (2006) Identifying and generating easy sets of constraints for clustering. In: Proceedings of the twenty-first national conference on artificial intelligence and the eighteenth innovative applications of artificial intelligence conference (AAAI), pp 336–341
Davidson I, Ravi SS (2007) The complexity of non-hierarchical clustering with instance and cluster level constraints. Data Min Knowl Discov 14(1):25–61
Davidson I, Ravi SS (2009) Using instance-level constraints in agglomerative hierarchical clustering: theoretical and empirical results. Data Min Knowl Discov 18(2):257–282
Davidson I, Wagstaff K, Basu S (2006) Measuring constraint-set utility for partitional clustering algorithms. In: Knowledge discovery in databases: PKDD 2006, 10th European conference on principles and practice of knowledge discovery in databases (PKDD), pp 115–126
Dawson S, di Vimercati SDC, Samarati P (1999) Specification and enforcement of classification and inference constraints. In: IEEE symposium on security and privacy, pp 181–195
De Raedt L, Guns T, Nijssen S (2010) Constraint programming for data mining and machine learning. In: Fox M, Poole D (eds) Proceedings of the twenty-fourth AAAI conference on artificial intelligence, AAAI 2010, Atlanta, July 11–15, 2010. AAAI Press, pp 1671–1675. http://www.aaai.org/ocs/index.php/AAAI/AAAI10/paper/view/1837
De Raedt L (2002) A perspective on inductive databases. SIGKDD Explor 4(2):69–77. doi:10.1145/772862.772871
Demiriz A, Bennett KP, Bradley PS (2008) Using assignment constraints to avoid empty clusters in k-means clustering. Constrained clustering: advances in algorithms, theory, and applications. Chapman and Hall/CRC, Boca Raton, pp 201–220
Druck G, Mann GS, McCallum A (2008) Learning from labeled features using generalized expectation criteria. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval (SIGIR), pp 595–602
Duivesteijn W, Feelders A (2008) Nearest neighbour classification with monotonicity constraints. Mach Learn Knowl Discov Databases Eur Conf ECML/PKDD 2008:301–316
Dzeroski S, Goethals B, Panov P (2010) Inductive databases and constraint-based data mining. Springer, New York
Euler T, Klinkenberg R, Mierswa I, Scholz M, Wurst M (2006) YALE: rapid prototyping for complex data mining tasks. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 935–940
Fawcett T (2006) An introduction to roc analysis. Pattern Recognit Lett 27(8):861–874
Fiot C, Laurent A, Teisseire M (2009) Softening the blow of frequent sequence analysis: soft constraints and temporal accuracy. Int J Web Eng Technol 5(1):24–47
Fromont É, Blockeel H, Struyf J (2006) Integrating decision tree learning into inductive databases. In: Proceedings of the 5th international workshop on knowledge discovery in inductive databases (KDID), pp 81–96
Fu Y, Han J (1995) Meta-rule-guided mining of association rules in relational databases. In: Proceedings of the post-conference workshops on integration of knowledge discovery in databases with deductive and object-oriented databases (KDOOD/TDOOD), pp 39–46
Fu Y, Han J, Koperski K, Wang W, Zaiane O (1996) DMQL: a data mining query language for relational databases. In: Proceedings of the first workshop on research issues in data mining and knowledge discovery (DMKD), pp 122–133
Garofalakis MN, Rastogi R, Shim K (1999) SPIRIT: Sequential pattern mining with regular expression constraints. In: Proceedings of 25th international conference on very large data bases (VLDB), pp 223–234
Garofalakis MN, Hyun D, Rastogi R, Shim K (2003) Building decision trees with constraints. Data Min Knowl Discov 7(2):187–214
Giannotti F, Nanni M, Pedreschi D (2000) Logic-based knowledge discovery in databases. In: Proceedings of tenth European–Japanese conference on information modelling and knowledge bases (EJC), pp 279–283
Gilpin S, Davidson I (2011) Incorporating SAT solvers into hierarchical clustering algorithms: an efficient and flexible approach. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 1136–1144
Grossi V, Monreale A, Nanni M, Pedreschi D, Turini F (2015) Software engineering and formal methods: SEFM 2015 collocated workshops: ATSE, HOFM, MoKMaSD, and VERY*SCART, York, UK, September 7–8, 2015. Revised selected papers, chap. clustering formulation using constraint optimization. Springer, Berlin, Heidelberg, pp 93–107. doi:10.1007/978-3-662-49224-6_9
Grossi V, Romei A (2012) XQuake as a constraint-based mining language. In: Proceedings of the ECAI 2012 workshop on combining constraint solving with mining and learning (CoCoMile), pp 90–91
Gu W, Chen B, Hu J (2010) Combining binary-svm and pairwise label constraints for multi-label classification. In: Proceedings of the IEEE international conference on systems, man and cybernetics (SMC), pp 4176–4181
Guns T, Nijssen S, De Raedt L (2011) Itemset mining: a constraint programming perspective. Artif Intell 175(12–13):1951–1983
Guns T, Nijssen S, De Raedt L (2013) k-Pattern set mining under constraints. IEEE Trans Knowl Data Eng 25(2):402–418
Guns T, Dries A, Tack G, Nijssen S, De Raedt L (2013a) Miningzinc: a modeling language for constraint-based mining. In: Rossi F (ed) IJCAI 2013, proceedings of the 23rd international joint conference on artificial intelligence, Beijing, China, August 3–9, 2013. IJCAI/AAAI. http://www.aaai.org/ocs/index.php/IJCAI/IJCAI13/paper/view/6947
Guns T, Dries A, Tack G, Nijssen S, De Raedt L (2013b) The miningzinc framework for constraint-based itemset mining. In: Ding W, Washio T, Xiong H, Karypis G, Thuraisingham BM, Cook DJ, Wu X (eds) 13th IEEE international conference on data mining workshops, ICDM workshops, TX, December 7–10, 2013. IEEE computer society, pp 1081–1084. doi:10.1109/ICDMW.2013.38
Han J, Lakshmanan LVS, Ng RT (1999) Constraint-based multidimensional data mining. IEEE Comput 32(8):46–50
Han J, Cheng H, Xin D, Yan X (2007) Frequent pattern mining: current status and future directions. Data Min Knowl Discov 15(1):55–86
Han J, Fu Y (1999) Mining multiple-level association rules in large databases. IEEE Trans Knowl Data Eng 11(5):798–805
Han J, Kamber M (2012) Data mining: concepts and techniques, 3rd edn. Morgan Kaufmann, San Francisco
Hansen P, Aloise D (2009) A survey on exact methods for minimum sum-of-squares clustering, pp 1–2. http://www.math.iit.edu/Buck65files/msscStLouis.pdf
Har-Peled S, Roth D, Zimak D (2002) Constraint classification: a new approach to multiclass classification. In: Proceedings of the 13th international conference algorithmic learning theory (ALT), pp 365–379
Hirate Y, Yamana H (2006) Generalized sequential pattern mining with item intervals. J Comput 1(3):51–60
Hu YH, Kao YH (2011) Mining sequential patterns with consideration to recency, frequency, and monetary. In: Proceedings of the Pacific Asia conference on information systems (PACIS), pp 78–91
Hu YH, Yen TW (2010) Considering RFM-values of frequent patterns in transactional databases. In: Proceedings of the 2th international conference on software engineering and data mining (SEDM), pp 422–427
Hwang JH, Gu MS (2014) Ontology based service frequent pattern mining. Future Inf Technol 309:809–814. doi:10.1007/978-3-642-55038-6-123
Imielinski T, Mannila H (1996) A database perspective on knowledge discovery. Commun ACM 39(11):58–64
Imielinski T, Virmani A (1999) MSQL: a query language for database mining. Data Min Knowl Discov 2(4):373–408
Jeudy B, Boulicaut JF (2002) Optimization of association rule mining queries. Intell Data Anal 6(4):341–357
Kestler H, Kraus J, Palm G, Schwenker F (2006) On the effects of constraints in semi-supervised hierarchical clustering. In: Schwenker F, Marinai S (eds) Artificial neural networks in pattern recognition, vol 4087., Lecture notes in computer scienceSpringer, Berlin, heidelberg, pp 57–66
Klein D, Kamvar SD, Manning CD (2002) From instance-level constraints to space-level constraints: making the most of prior knowledge in data clustering. In: Proceedings of the nineteenth international conference on machine learning, ICML ’02. Morgan Kaufmann Publishers Inc., San Francisco, CA, pp 307–314. http://dl.acm.org/citation.cfm?id=645531.655989
Kumar N, Kummamuru K (2008) Semisupervised clustering with metric learning using relative comparisons. IEEE Trans Knowl Data Eng 20(4):496–503
Kummamuru K, Krishnapuram R, Agrawal R (2004) Learning spatially variant dissimilarity (svad) measures. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 611–616
Lakshmanan LVS, Ng R, Han J, Pang A (1999) Optimization of constrained frequent set queries with 2-variable constraints. ACM SIGMOD Rec 28(2):157–168. doi:10.1145/304181.304196
Lange TCMH, Anil L, Jain K, Buhmann JM (2005) Learning with constrained and unlabeled data. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 731–738
Law MHC, Topchy AP, Jain AK (2005) Model-based clustering with probabilistic constraints. In: Kargupta et al., pp 641–645. doi:10.1137/1.9781611972757.77
Law MHC, Topchy A, Jain AK (2004) Structural, syntactic, and statistical pattern recognition: joint IAPR international workshops, SSPR 2004 and SPR 2004, Lisbon, Portugal, August 18–20, 2004. Proceedings, chap. Clustering with soft and group constraints. Springer, Berlin, Heidelberg, pp 662–670. doi:10.1007/978-3-540-27868-9_72
Law Y, Wang H, Zaniolo C (2004) Query languages and data models for database sequences and data streams. In: Proceedings of the 30th international conference on very large data bases (VLDB), pp 492–503
Leung CKS, Hao B, Brajczuk DA (2010) Mining uncertain data for frequent itemsets that satisfy aggregate constraints. In: Proceedings of the 2010 ACM symposium on applied computing (SAC), pp 1034–1038
Li YC, Yeh JS, Chang CC (2008) Isolated items discarding strategy for discovering high utility itemsets. Data Knowl Eng 64(1):198–217
Li Z, Liu J, Tang X (2009) Constrained clustering via spectral regularization. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR), pp 421–428
Lin MY, Hsueh SC, Chang CW (2008) Fast discovery of sequential patterns in large databases using effective time-indexing. Inf Sci 178(22):4228–4245
Lin MY, Lee SY (2005) Efficient mining of sequential patterns with time constraints by delimited pattern growth. Knowl Inf Syst 7(4):499–514
Liu EY, Zhang Z, Wang W (2011) Clustering with relative constraints. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 947–955
Lu Z, Carreira-Perpiñán MÁ (2008) Constrained spectral clustering through affinity propagation. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR), pp 1–8
Lucey S, Ashraf AB (2013) Nearest neighbor classifier generalization through spatially constrained filters. Pattern Recognit 46(1):325–331. doi:10.1016/j.patcog.2012.06.009
Lu Z, Leen TK (2007) Penalized probabilistic clustering. Neural Comput 19(6):1528–1567. doi:10.1162/neco.2007.19.6.1528
Mabroukeh NR, Ezeife CI (2010) A taxonomy of sequential pattern mining algorithms. ACM Comput Surv 43(1):1–3
Mansingh G, Osei-Bryson KM, Reichgelt H (2011) Using ontologies to facilitate post-processing of association rules by domain experts. Inf Sci 181(3):419–434
Marinica C, Guillet F (2010a) Knowledge-based interactive postmining of association rules using ontologies. IEEE Trans Knowl Data Eng 22(6):784–797
Marinica C, Guillet F (2010) Knowledge-based interactive postmining of association rules using ontologies. IEEE Trans Knowl Data Eng 22(6):784–797. doi:10.1109/TKDE.2010.29
Marriott K, Nethercote N, Rafeh R, Stuckey PJ, de la Banda MG, Wallace M (2008) The design of the zinc modelling language. Constraints 13(3):229–267. doi:10.1007/s10601-008-9041-4
Masseglia F, Poncelet P, Teisseire M (2009) Efficient mining of sequential patterns with time constraints: reducing the combinations. Expert Syst Appl 36(2):2677–2690
Masson C, Robardet C, Boulicaut J (2004) Optimizing subset queries: a step towards sql-based inductive databases for itemsets. In: Haddad H, Omicini A, Wainwright RL, Liebrock LM (eds) Proceedings of the 2004 ACM symposium on applied computing (SAC), Nicosia, Cyprus, March 14–17, 2004. ACM, pp 535–539. doi:10.1145/967900.968013
Meo R, Psaila G, Ceri S (1998) An extension to SQL for mining association rules. Data Min Knowl Disc 2(2):195–224. doi:10.1023/A:1009774406717
Meo R, Psaila G (2006) An XML-based database for knowledge discovery. In: Proceedings of the 10th international conference on extending database technology (EDBT), pp 814–828
Morzy T, Zakrzewicz M (1997) SQL-like language for database mining. In: Proceedings of the first east-European symposium on advances in databases and information systems (ADBIS), pp 331–317
Nethercote N, Stuckey PJ, Becket R, Brand S, Duck GJ, Tack G (2007) Minizinc: towards a standard CP modelling language. In: Proceedings of the 13th international conference on principles and practice of constraint programming, CP’07. Springer, Berlin, Heidelberg, pp 529–543. http://dl.acm.org/citation.cfm?id=1771668.1771709
Nguyen N, Caruana R (2008) Improving classification with pairwise constraints: a margin-based approach. In: Daelemans W, Goethals B, Morik K (eds) ECML/PKDD (2), Lecture Notes in Computer Science, vol 5212. Springer, pp 113–124. http://dblp.uni-trier.de/db/conf/pkdd/pkdd2008-2.html#NguyenC08
Nijssen S, Fromont E (2007) Mining optimal decision trees from itemset lattices. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 530–539
Nijssen S, Fromont E (2010) Optimal constraint-based decision tree induction from itemset lattices. Data Min Knowl Disc 21(1):9–51. doi:10.1007/s10618-010-0174-x
Niyogi P, Pierrot JB, Siohan O (2000) Multiple classifiers by constrained minimization. In: Proceedings of the acoustics, speech, and signal processing, 2000. On IEEE international conference, vol 06, ICASSP ’00. IEEE Computer Society, Washington, DC, pp 3462–3465. doi:10.1109/ICASSP.2000.860146
Okabe M, Yamada S (2012) Clustering by learning constraints priorities. In: Proceedings of the 12th international conference on data mining (ICDM), pp 1050–1055
Park SH, Fürnkranz J (2008) Multi-label classification with label constraints. In: Technical report, knowledge engineering group, TU Darmstadt
Pei J, Han J, Lakshmanan LVS (2004) Pushing convertible constraints in frequent itemset mining. Data Min Knowl Disc 8(3):227–252
Pei J, Han J, Wang W (2007) Constraint-based sequential pattern mining: the pattern growth methods. Inf Sci 28(2):133–160
Pei J, Han J (2000) Can we push more constraints into frequent pattern mining? In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 350–354
Pinto H, Han J, Pei J, Wang K, Chen Q, Dayal U (2001) Multi-dimensional sequential pattern mining. In: Proceedings of the 2001 ACM CIKM international conference on information and knowledge management (CIKM), pp 81–88
Plantevit M, Laurent A, Laurent D, Teisseire M, Choong YW (2010) Mining multidimensional and multilevel sequential patterns. Trans Knowl Discov Data 4(1):4
Pyle D (1999) Data preparation for data mining. Morgan Kaufmann Publishers Inc., San francisco
Richter L, Wicker J, Kessler K, Kramer S (2008) An inductive database and query language in the relational model. In: Proceedings of the 11th international conference on extending database technology (EDBT), pp 740–744
Rigollet P, Tong X (2011a) Neyman-pearson classification, convexity and stochastic constraints. J Mach Learn Res 12:2831–2855
Rigollet P, Tong X (2011b) Neyman-pearson classification under a strict constraint. Proc Track J Mach Learn Res 19:595–614
Romei A, Ruggieri S, Turini F (2006) KDDML: a middleware language and system for knowledge discovery in databases. Data Knowl Eng 57(2):179–220. doi:10.1016/j.datak.2005.04.007
Romei A, Turini F (2011) Programming the KDD process using XQuery. In: Proceedings of the international conference on knowledge discovery and information retrieval (KDIR), pp 131–139
Romei A, Turini F (2010) XML data mining. Softw Pract Exp 40(2):101–130. doi:10.1002/spe.944
Romei A, Turini F (2011) Inductive database languages: requirements and examples. Knowl Inf Syst 26(3):351–384
Ruiz C, Spiliopoulou M, Ruiz EM (2010) Density-based semi-supervised clustering. Data Min Knowl Disc 21(3):345–370
Sarawagi S, Thomas S, Agrawal R (2000) Integrating association rule mining with relational database systems: alternatives and implications. Data Min Knowl Disc 4(2/3):89–125
Schultz M, Joachims T (2003) Learning a distance metric from relative comparisons. In: Thrun S, Saul LK, Schölkopf B (eds) Proceeding of advances in neural information processing systems (NIPS), December 8–13, 2003, Vancouver and Whistler, British Columbia. MIT Press, pp 41–48. http://papers.nips.cc/paper/2366-learning-a-distance-metric-from-relative-comparisons
Shankar S (2009) Utility sentient frequent itemset mining and association rule mining: a literature survey and comparative study. Int J Soft Comput Appl 4:81–95
Small K, Wallace BC, Brodley CE, Trikalinos TA (2011) The constrained weight space SVM: learning with ranked features. In: Proceedings of the 28th international conference on machine learning (ICML), pp 865–872
Soulet A, Crémilleux B (2005) Optimizing constraint-based mining by automatically relaxing constraints. In: Proceedings of the 5th IEEE international conference on data mining (ICDM), 27–30 November 2005, Houston. IEEE Computer Society, pp 777–780. doi:10.1109/ICDM.2005.112
Soulet A, Crémilleux B (2009) Mining constraint-based patterns using automatic relaxation. Intell Data Anal 13(1):109–133
Soulet A, Crémilleux B, Plantevit M (2011) Summarizing contrasts by recursive pattern mining. In: Spiliopoulou M, Wang H, Cook DJ, Pei J, Wang W, Zaïane OR, Wu X (eds) Data mining workshops (ICDMW), 2011 IEEE 11th international conference on, Vancouver, December 11, 2011. IEEE Computer Society, pp 1155–1162. doi:10.1109/ICDMW.2011.161
Srikant R, Agrawal R (1995) Mining generalized association rules. In: Proceedings of the 21th conference on very large data bases (VLDB), pp 407–419
Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: Proceedings of the 5th international conference on extending database technology (EDBT), pp 3–17
Srikant R, Vu Q, Agrawal R (1997) Mining association rules with item constraints. In: Proceedings of the third international conference on knowledge discovery and data mining (KDD), pp 67–73
Sriphaew K, Theeramunkong T (2002) A new method for finding generalized frequent itemsets in generalized association rule mining. In: Proceedings of the 7th IEEE symposium on computers and communications (ISCC), pp 1040–1045
Strehl A, Ghosh J (2003) Relationship-based clustering and visualization for high-dimensional data mining. INFORMS J Comput 15(2):208–230
Tan PN, Steinbach M, Kumar V (2006) Introduction to data mining. Addison Wesley, Boston
Tao F, Murtagh F (2003) Weighted association rule mining using weighted support and significance framework. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 661–666
Trasarti R, Bonchi F, Goethals B (2008) Sequence mining automata: a new technique for mining frequent sequences under regular expressions. In: Proceedings of the 8th IEEE international conference on data mining (ICDM), pp 1061–1066
Tseng VS, Shie BE, Wu CW, Yu PS (2013) Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans Knowl Data Eng 25(8):1772–1786. doi:10.1109/TKDE.2012.59
Tsochantaridis I, Joachims T, Hofmann T, Altun Y (2005) Large margin methods for structured and interdependent output variables. J Mach Learn Res 6:1453–1484
Vanderlooy S, Sprinkhuizen-Kuyper IG, Smirnov EN, van den Herik HJ (2009) The roc isometrics approach to construct reliable classifiers. Intell Data Anal 13(1):3–37. http://dl.acm.org/citation.cfm?id=1551758.1551760
Vens C, Struyf J, Schietgat L, Dzeroski S, Blockeel H (2008) Decision trees for hierarchical multi-label classification. Mach Learn 73(2):185–214
von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416. doi:10.1007/s11222-007-9033-z
Vu V, Labroche N, Bouchon-Meunier B (2010) An efficient active constraint selection algorithm for clustering. In: 20th international conference on pattern recognition, ICPR 2010, Istanbul, Turkey, 23–26 August 2010. IEEE Computer Society, pp 2969–2972. doi:10.1109/ICPR.2010.727
Wagstaff K, Basu S, Davidson I (2006) When is constrained clustering beneficial, and why? In: Proceedings, the twenty-first national conference on artificial intelligence and the eighteenth innovative applications of artificial intelligence conference (AAAI)
Wagstaff K, Cardie C (2000) Clustering with instance-level constraints. In: Proceedings of the seventeenth national conference on artificial intelligence and twelfth conference on Innovative applications of artificial intelligence (AAAI/IAAI), pp 1103–1110
Wagstaff K, Cardie C, Rogers S, Schrödl S (2001) Constrained k-means clustering with background knowledge. In: Proceedings of the eighteenth international conference on machine learning, ICML ’01. Morgan Kaufmann Publishers Inc., San Francisco, pp 577–584. http://dl.acm.org/citation.cfm?id=645530.655669
Wang K, Jiang Y, Yu JX, Dong G, Han J (2005) Divide-and-approximate: a novel constraint push strategy for iceberg cube mining. IEEE Trans Knowl Data Eng 17(3):354–368
Wang X, Rostoker C, Hamilton HJ (2012) A density-based spatial clustering for physical constraints. J Intell Inf Syst 38(1):269–297
Wang X, Qian B, Davidson I (2014) On constrained spectral clustering and its applications. Data Min Knowl Disc 28(1):1–30. doi:10.1007/s10618-012-0291-9
Wang X, Davidson I (2010) Flexible constrained spectral clustering. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 563–572
Wang F, Ding CHQ, Li T (2009) Integrated kl (k-means—laplacian) clustering: a new clustering approach by combining attribute data and pairwise relations. In: Proceedings of the SIAM international conference on data mining (SDM), pp 38–48
Wang W, Yang J, Yu PS (2000) Efficient mining of weighted association rules (WAR). In: Proceedings of the 6th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 270–274
Wei JT, Lin SY, Wu HH (2010) A review of the application of rfm model. Afr J Bus Manag 4(19):4199–4206
Witten IH, Frank E, Hall M (2011) Data mining, pratical machine learning tools and techiniques, 3rd edn. Morgan Kaufmann, San Francisco
Wu CM, Huang YF (2011) Generalized association rule mining using an efficient data structure. Expert Syst Appl 38(6):7277–7290
Xing EP, Ng AY, Jordan MI, Russell SJ (2002) Distance metric learning with application to clustering with side-information. In: Advances in neural information processing systems (NIPS), pp 505–512
Xing EP, Ng AY, Jordan MI, Russell S (2002) Distance metric learning, with application to clustering with side-information. Advances in neural information processing systems 15. MIT Press, Cambridge
Yan R, Zhang J, Yang J, Hauptmann AG (2006) A discriminative learning framework with pairwise constraints for video object classification. IEEE Trans Pattern Anal Mach Intell 28(4):578–593. doi:10.1109/TPAMI.2006.65
Yan W, Goebel KF (2004) Designing classifier ensembles with constrained performance requirements. In: Proceedings of the SPIE defense security symposium, multisensor multisource information fusion: architectures, algorithms, and applications (2004), pp 78-87
Yao H, Hamilton HJ, Butz CJ (2004) A foundational approach to mining itemset utilities from databases. In: Proceedings of the fourth SIAM international conference on data mining (SDM), pp 482–486
Yun U (2008) A new framework for detecting weighted sequential patterns in large sequence databases. Knowl Based Syst 21(2):110–122
Yun U, Shin H, Ryu KH, Yoon E (2012) An efficient mining algorithm for maximal weighted frequent patterns in transactional databases. Knowl Based Syst 33:53–64
Yun U, Leggett JJ (2005) WFIM: weighted frequent itemset mining with a weight range and a minimum weight. In: Kargupta et al., pp 636–640. doi:10.1137/1.9781611972757.76
Yun U, Ryu KH (2010) Discovering important sequential patterns with length-decreasing weighted support constraints. Int J Inf Technol Decis Mak 9(4):575–599
Yun U, Ryu KH (2011) Approximate weighted frequent pattern mining with/without noisy environments. Knowl Based Syst 24(1):73–82
Zaidan O, Eisner J (2008) Modeling annotators: a generative approach to learning from annotator rationales. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP), pp 31–40
Zaki MJ (2000) Sequence mining in categorical domains: incorporating constraints. In: Proceedings of the 9th international conference on information and knowledge management (CIKM), pp 422–429
Zhang J, Yan R (2007) On the value of pairwise constraints in classification and consistency. In: Proceedings of the 24th international conference on machine learning, ICML ’07. ACM, New York, pp 1111–1118. doi:10.1145/1273496.1273636
Zhang C, Zhang S (2002) Association rule mining, models and algorithms, lecture notes in computer science. Springer, New York
Zhang Y, Zhang L, Nie G, Shi Y (2009) A survey of interestingness measures for association rules. In: Proceedings of the second international conference on business intelligence and financial engineering, (BIFE), pp 460–463
Zhong S, Ghosh J (2003) Scalable, balanced model-based clustering. In: Proceedings of the third SIAM international conference on data mining (SDM), San Francisco, pp 71–82
Acknowledgments
This work was supported by the European Commission under the project “Inductive Constraint Programming (ICON)” contract number FP7-284715, and by a Grant for “Big Data Social Mining” of the University of Pisa. We warmly thank the anonymous referees for their very valuable suggestions.
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Kristian Kersting.
Rights and permissions
About this article
Cite this article
Grossi, V., Romei, A. & Turini, F. Survey on using constraints in data mining. Data Min Knowl Disc 31, 424–464 (2017). https://doi.org/10.1007/s10618-016-0480-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-016-0480-z