Abstract
This paper provides an overview of the current state-of-the-art on using constraints in knowledge discovery and data mining. The use of constraints requires mechanisms for defining and evaluating them during the knowledge extraction process. We give a structured account of three main groups of constraints based on the specific context in which they are defined and used. The aim is to provide a complete view on constraints as a building block of data mining methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ankerst, M., Breunig, M.M., Kriegel, H.-P., Sander, J.: Optics: ordering points to identify the clustering structure. In: Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, SIGMOD 1999, pp. 49–60. ACM, New York, NY, USA (1999)
Aizerman, M.A., Braverman, E.A., Rozonoer, L.: Theoretical foundations of the potential function method in pattern recognition learning. Autom. Remote Control 25, 821–837 (1964)
Ayres, J., Flannick, J., Gehrke, J., Yiu, T.: Sequential pattern mining using a bitmap representation. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 429–435 (2002)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of 20th International Conference on Very Large Data Bases (VLDB 1994), Santiago de Chile, Chile, 12–15 September, pp. 487–499 (1994)
Ahmed, C.F., Tanbeer, S.K., Jeong, B.-S., Lee, Y.-K., Choi, H.-J.: Single-pass incremental and interactive mining for weighted frequent patterns. Expert Syst. Appl. 39(9), 7976–7994 (2012)
Bradley, P.S., Bennett, K.P., Demiriz, A.: Constrained k-means clustering. Technical report, MSR-TR-2000-65, Microsoft Research (2000)
Basu, S., Bilenko, M., Mooney, R.J.: A probabilistic framework for semi-supervised clustering. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 59–68 (2004)
Bilenko, M., Basu, S., Mooney, R.J.: Integrating constraints and metric learning in semi-supervised clustering. In: Proceedings of the Twenty-First International Conference on Machine Learning, ICML 2004, p. 11. ACM, New York (2004)
Baralis, E., Cagliero, L., Cerquitelli, T., Garza, P.: Generalized association rule mining with constraints. Inf. Sci. 194, 68–84 (2012)
Basu, S., Davidson, I., Wagstaff, K.L.: Constrained Clustering: Advances in Algorithms, Theory, and Applications. Chapman and Hall/CRC, Boca Raton (2008)
Bertsekas, D.P.: Linear Network Optimization - Algorithms and Codes. MIT Press, Cambridge (1991)
Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth International Group, Belmont (1984)
Banerjee, A., Ghosh, J.: Scalable clustering algorithms with balancing constraints. Data Min. Knowl. Discov. 13(3), 365–395 (2006)
Banerjee, A., Ghosh, J.: Clustering with balancing constraints. Constrained Clustering: Advances in Algorithms. Theory, and Applications, pp. 171–200. Chapman and Hall/CRC, Boca Raton (2008)
Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, COLT 1992, pp. 144–152. ACM, New York (1992)
Barbará, D., Kamath, C. (eds.): Proceedings of the Third SIAM International Conference on Data Mining, 1–3 May 2003. SIAM, San Francisco (2003)
Bistarelli, S., Montanari, U., Rossi, F.: Semiring-based constraint solving and optimization. J. ACM 44(2), 201–236 (1997)
Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. 2(2), 121–167 (1998)
Bult, J.R., Wansbeek, T.J.: Optimal selection for direct mail. Mark. Sci. 14(4), 378–394 (1995)
Capelle, M., Masson, C., Boulicaut, J.F.: Mining frequent sequential patterns under regular expressions: a highly adaptive strategy for pushing constraints. In: Proceedings of the Third SIAM International Conference on Data Mining, pp. 316–320 (2003)
Chand, C., Thakkar, A., Ganatra, A.: Sequential pattern mining: survey and current research challenges. Int. J. Soft Comput. Eng. (IJSCE) 2(1), 2231–2307 (2012)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Davidson, I.: Two approaches to understanding when constraints help clustering. In: The 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 1312–1320 (2012)
Demiriz, A., Bennett, K.P., Bradley, P.S.: Using assignment constraints to avoid empty clusters in k-means clustering. Constrained Clustering: Advances in Algorithms. Theory, and Applications, pp. 201–220. Chapman and Hall/CRC, Boca Raton (2008)
Dao, T.-B.-H., Duong, K.-C., Vrain, C.: A declarative framework for constrained clustering. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013. LNCS (LNAI), vol. 8190, pp. 419–434. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40994-3_27
Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, New York (1973)
Druck, G., Mann, G.S., McCallum, A.: Learning from labeled features using generalized expectation criteria. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 595–602 (2008)
Domingos, P., Pazzani, M.J.: Beyond independence: conditions for the optimality of the simple Bayesian classifier. In: Proceedings of the 13th International Conference on Machine Learning (ICML 1996), Bari, Italy, pp. 148–156 (1996)
Davidson, I., Ravi, S.S.: Clustering with constraints: feasibility issues and the k-means algorithm. In: Proceedings of the SIAM International Conference on Data Mining (SDM) (2005)
Davidson, I., Ravi, S.S.: Identifying and generating easy sets of constraints for clustering. In: Proceedings of the Twenty-First National Conference on Artificial Intelligence and the Eighteenth Innovative Applications of Artificial Intelligence Conference (AAAI), pp. 336–341 (2006)
Davidson, I., Ravi, S.S.: The complexity of non-hierarchical clustering with instance and cluster level constraints. Data Min. Knowl. Discov. 14(1), 25–61 (2007)
Davidson, I., Ravi, S.S.: Using instance-level constraints in agglomerative hierarchical clustering: theoretical and empirical results. Data Min. Knowl. Discov. 18(2), 257–282 (2009)
Davidson, I., Wagstaff, K.L., Basu, S.: Measuring constraint-set utility for partitional clustering algorithms. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 115–126. Springer, Heidelberg (2006). doi:10.1007/11871637_15
Ester, M., Kriegel, H.-P., Sander, J., Xiaowei, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD), pp. 226–231 (1996)
Elloumi, M., Zomaya, A.Y.: Biological Knowledge Discovery Handbook: Preprocessing, Mining and Postprocessing of Biological Data, 1st edn. Wiley, New York (2013)
Yongjian, F., Han, J.: Meta-rule-guided mining of association rules in relational databases. In: Proceedings of the Post-Conference Workshops on Integration of Knowledge Discovery in Databases with Deductive and Object-Oriented Databases (KDOOD/TDOOD), pp. 39–46 (1995)
Grossi, V., Monreale, A., Nanni, M., Pedreschi, D., Turini, F.: Clustering formulation using constraint optimization. In: Bianculli, D., Calinescu, R., Rumpe, B. (eds.) SEFM 2015. LNCS, vol. 9509, pp. 93–107. Springer, Heidelberg (2015). doi:10.1007/978-3-662-49224-6_9
Garofalakis, M.N., Rastogi, R., Shim, K.: SPIRIT: Sequential pattern mining with regular expression constraints. In: Proceedings of 25th International Conference on Very Large Data Bases (VLDB), pp. 223–234 (1999)
Grossi, V., Sperduti, A.: Kernel-based selective ensemble learning for streams of trees. In: Walsh, T. (ed.) IJCAI 2011, Proceedings of the 22nd International Joint Conference on Artificial Intelligence, Barcelona, Catalonia, Spain, 16–22 July 2011, pp. 1281–1287. IJCAI/AAAI (2011)
Grossi, V., Turini, F.: Stream mining: a novel architecture for ensemble-based classification. Knowl. Inf. Syst. 30(2), 247–281 (2012)
Han, J., Cheng, H., Xin, D., Yan, X.: Frequent pattern mining: current status and future directions. Data Min. Knowl. Discov. 15(1), 55–86 (2007)
Han, J., Fu, Y.: Mining multiple-level association rules in large databases. IEEE Trans. Knowl. Data Eng. 11(5), 798–805 (1999)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2012)
Han, J., Lakshmanan, L.V.S., Ng, R.T.: Constraint-based multidimensional data mining. IEEE Comput. 32(8), 46–50 (1999)
Har-Peled, S., Roth, D., Zimak, D.: Constraint classification: a new approach to multiclass classification. In: Proceedings of the 13th International Conference Algorithmic Learning Theory (ALT), pp. 365–379 (2002)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, Texas, USA, 16–18 May, pp. 1–12 (2000)
Inokuchi, A., Washio, T., Motoda, H.: An apriori-based algorithm for mining frequent substructures from graph data. In: Zighed, D.A., Komorowski, J., Żytkow, J. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 13–23. Springer, Heidelberg (2000). doi:10.1007/3-540-45372-5_2
Jensen, F.V.: An introduction to Bayesian networks. Springer, New York (1996)
Kumar, N., Kummamuru, K.: Semisupervised clustering with metric learning using relative comparisons. IEEE Trans. Knowl. Data Eng. 20(4), 496–503 (2008)
Kummamuru, K., Krishnapuram, R., Agrawal, R.: Learning spatially variant dissimilarity (SVaD) measures. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 611–616 (2004)
Lin, T.S., Loh, W.Y., Shib, Y.S.: A comparison of prediction accuracy, complexity, and training time of thirty-tree old and new classification algorithms. Mach. Learn. 40(3), 203–228 (2000)
Li, Y.-C., Yeh, J.-S., Chang, C.-C.: Isolated items discarding strategy for discovering high utility itemsets. Data Knowl. Eng. 64(1), 198–217 (2008)
Mehta, M., Agrawal, R., Rissanen, J.: SLIQ: A fast scalable classifier for data mining. In: Proceedings of 5th International Conference on Extending Database Technology (EBDT 1996), Avignon, France, pp. 18–32 (1996)
Mabroukeh, N.R., Ezeife, C.I.: A taxonomy of sequential pattern mining algorithms. ACM Comput. Surv. 43(1), 3: 1–3: 41 (2010)
Michell, T.: Machine Learning. McGraw Hill, New York (1997)
Moret, B.M.E.: Decision trees and diagrams. Comput. Surv. 14(4), 593–623 (1982)
Masseglia, F., Poncelet, P., Teisseire, M.: Efficient mining of sequential patterns with time constraints: reducing the combinations. Expert Syst. Appl. 36(2), 2677–2690 (2009)
Nijssen, S., Fromont, É.: Mining optimal decision trees from itemset lattices. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 530–539 (2007)
Nijssen, S., Fromont, E.: Optimal constraint-based decision tree induction from itemset lattices. Data Min. Knowl. Discov. Fromont. 21(1), 9–51 (2010)
Niyogi, P., Pierrot, J.-B., Siohan, O.: Multiple classifiers by constrained minimization. In: Proceedings of the Acoustics, Speech, and Signal Processing of 2000 IEEE International Conference on ICASSP 2000, vol. 06, pp. 3462–3465. IEEE Computer Society, Washington, DC (2000)
Okabe, M., Yamada, S.: Clustering by learning constraints priorities. In: Proceedings of the 12th International Conference on Data Mining (ICDM2012), pp. 1050–1055 (2012)
Park, S.H., Furnkranz, J.: Multi-label classification with label constraints. Technical report, Knowledge Engineering Group, TU Darmstadt (2008)
Pei, J., Han, J., Mortazavi-Asl, B., Wang, J., Pinto, H., Chen, Q., Dayal, U., Hsu, M.: Mining sequential patterns by pattern-growth: the prefixspan approach. IEEE Trans. Knowl. Data Eng. 16(11), 1424–1440 (2004)
Pei, J., Han, J., Wang, W.: Constraint-based sequential pattern mining: the pattern growth methods. J. Intell. Inf. Syst. 28(2), 133–160 (2007)
Pyle, D.: Data Preparation for Data Mining. Morgan Kaufmann Publishers Inc., San Francisco (1999)
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1, 81–106 (1986)
Quinlan, J.R.: C4.5 Programs for Machine Learning. Wadsworth International Group, Belmont (1993)
Quinlan, J.R.: Improved use of continuous attributes in C4.5. J. Artif. Intell. Res. 4, 77–90 (1996)
Ruggieri, S.: Efficient C4.5. IEEE Trans. Knowl. Data Eng. 14(2), 438–444 (2002)
Srikant, R., Agrawal, R.: Mining generalized association rules. In: Proceedings of the 21st Conference on Very Large Data Bases (VLDB), pp. 407–419 (1995)
Srikant, R., Agrawal, R.: Mining sequential patterns: generalizations and performance improvements. In: Proceedings of the 5th International Conference on Extending Database Technology (EDBT), pp. 3–17 (1996)
Shafer, J., Agrawal, R., Mehta, M.: Sprint: a scalable parallel classifier for data mining. In: Proceedings of 1996 International Conference on Very Large Data Bases (VLDB 1996), Bombay, India, pp. 544–555 (1996)
Steinwart, I., Christmann, A.: Support Vector Machines, 1st edn. Springer Publishing Company, Incorporated, Heidelberg (2008)
Strehl, A., Ghosh, J.: Relationship-based clustering and visualization for high-dimensional data mining. INFORMS J. Comput. 15(2), 208–230 (2003)
Shankar, S.: Utility sentient frequent itemset mining and association rule mining: a literature survey and comparative study. Int. J. Soft Comput. Appl. 4, 81–95 (2009)
Schultz, M., Joachims, T.: Learning a distance metric from relative comparisons. In: Proceedings of Conference Advances in Neural Information Processing Systems (NIPS) (2003)
Schultz, M., Joachims, T.: Learning a distance metric from relative comparisons. In: NIPS, MIT Press (2004)
Savasere, A., Omiecinski, E., Navathe, S.B.: An efficient algorithm for mining association rules in large databases. In: Proceedings of the 21st International Conference on Very Large Data Bases (VLDB), Zurich, Switzerland, 11–15 September 1995, pp. 432–444 (1995)
Sriphaew, K., Theeramunkong, T.: A new method for finding generalized frequent itemsets in generalized association rule mining. In: Proceedings of the 7th IEEE Symposium on Computers and Communications (ISCC), pp. 1040–1045 (2002)
Srikant, R., Quoc, V., Agrawal, R.: Mining association rules with item constraints. In: Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD), pp. 67–73 (1997)
Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large margin methods for structured and interdependent output variables. J. Mach. Learn. Res. 6, 1453–1484 (2005)
Tao, F., Murtagh, F.: Weighted association rule mining using weighted support and significance framework. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 661–666 (2003)
Toivonen, H.: Sampling large databases for association rules. In: Proceedings of the 22nd International Conference on Very Large Data Bases (VLDB), Mumbai (Bombay), India, 3–6 September, pp. 134–145 (1996)
Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addison Wesley, Boston (2006)
Tseng, V.S., Shie, B.-E., Wu, C.-W., Philip, S.: Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Transactions on Knowledge and Data Engineering, forthcoming
Vanderlooy, S., Sprinkhuizen-Kuyper, I.G., Smirnov, E.N., Jaap van den Herik, H.: The ROC isometrics approach to construct reliable classifiers. Intell. Data Anal. 13(1), 3–37 (2009)
Wagstaff, K., Basu, S., Davidson, I.: When is constrained clustering beneficial, and why? In: Proceedings of The Twenty-First National Conference on Artificial Intelligence and the Eighteenth Innovative Applications of Artificial Intelligence Conference (AAAI) (2006)
Wagstaff, K., Cardie, C.: Clustering with instance-level constraints. In: Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on on Innovative Applications of Artificial Intelligence (AAAI/IAAI), p. 1097 (2000)
Wagstaff, K., Cardie, C., Rogers, S., Schrödl, S.: Constrained k-means clustering with background knowledge. In: Proceedings of the Eighteenth International Conference on Machine Learning, ICML 2001, pp. 577–584. Morgan Kaufmann Publishers Inc., San Francisco (2001)
Witten, I.H., Frank, E., Hall, M.: Data Mining, Pratical Machine Learning Tools and Techiniques, 3rd edn. Morgan Kaufmann, San Francisco (2011)
Wu, C.-M., Huang, Y.-F.: Generalized association rule mining using an efficient data structure. Expert Syst. Appl. 38(6), 7277–7290 (2011)
Weiss, S.M., Kulikowski, C.A.: Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets. Machine Learning and Expert Systems. Morgan Kaufmann, San Francisco (1991)
Wei, J.-T., Lin, S.-Y., Hsin-Hung, W.: A review of the application of RFM model. Afr. J. Bus. Manag. 4(19), 4199–4206 (2010)
Wang, W., Wang, C., Zhu, Y., Shi, B., Pei, J., Yan, X., Han, J.: Graphminer: a structural pattern-mining system for large disk-based graph databases and its applications. In: zcan, F. (ed.) SIGMOD Conference, pp. 879–881. ACM (2005)
Wang, W., Yang, J., Philip, S.: Efficient mining of weighted association rules (WAR). In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 270–274 (2000)
Yan, W., Goebel, K.F.: Designing classifier ensembles with constrained performance requirements. In: Proceedings of SPIE Defense and Security Symposium, Multisensor Multisource Information Fusion: Architectures, Algorithms, and Applications 2004, pp. 78–87 (2004)
Yan, X., Han, J.: gSpan: Graph-based substructure pattern mining. In: Proceedings of the 2002 IEEE International Conference on Data Mining, ICDM 2002, p. 721. IEEE Computer Society, Washington, DC, USA (2002)
Yun, U., Leggett, J.J.: WFIM: weighted frequent itemset mining with a weight range and a minimum weight. In: Proceeding of the 2005 SIAM International Data Mining Conference, Newport Beach, CA, pp. 636–640 (2005)
Yun, U., HoRyu, K.: Approximate weighted frequent pattern mining with/without noisy environments. Knowl.-Based Syst. 24(1), 73–82 (2011)
Yun, U., Shin, H., Ho Ryu, K., Yoon, E.: An efficient mining algorithm for maximal weighted frequent patterns in transactional databases. Knowl.-Based Syst. 33, 53–64 (2012)
Zaki, M.J.: SPADE: an efficient algorithm for mining frequent sequences. Mach. Learn. 42(1/2), 31–60 (2001)
Zhong, S., Ghosh, J.: Scalable, balanced model-based clustering. In: Proceedings of the Third SIAM International Conference on Data Mining, San Francisco (SDM) (2003)
Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. In: Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD 1997), Newport Beach, California, USA, 14–17 August, pp. 283–286 (1997)
Zhang, T., Ramakrishnan, R., Livny, M.: Birch: an efficient data clustering method for very large databases. SIGMOD Rec. 25(2), 103–114 (1996)
Zhang, C., Zhang, S.: Association Rule Mining: Models and Algorithms. LNCS, vol. 2307. Springer, Heidelberg (2002)
Zhang, Y., Zhang, L., Nie, G., Shi, Y.: A survey of interestingness measures for association rules. In: Proceedings of the Second International Conference on Business Intelligence and Financial Engineering (BIFE), pp. 460–463 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this chapter
Cite this chapter
Grossi, V., Pedreschi, D., Turini, F. (2016). Data Mining and Constraints: An Overview. In: Bessiere, C., De Raedt, L., Kotthoff, L., Nijssen, S., O'Sullivan, B., Pedreschi, D. (eds) Data Mining and Constraint Programming. Lecture Notes in Computer Science(), vol 10101. Springer, Cham. https://doi.org/10.1007/978-3-319-50137-6_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-50137-6_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50136-9
Online ISBN: 978-3-319-50137-6
eBook Packages: Computer ScienceComputer Science (R0)