Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Soft constraints for pattern mining

Published: 01 April 2015 Publication History
  • Get Citation Alerts
  • Abstract

    Constraint-based pattern discovery is at the core of numerous data mining tasks. Patterns are extracted with respect to a given set of constraints (frequency, closedness, size, etc). In practice, many constraints require threshold values whose choice is often arbitrary. This difficulty is even harder when several thresholds are required and have to be combined. Moreover, patterns barely missing a threshold will not be extracted even if they may be relevant. The paper advocates the introduction of softness into the pattern discovery process. By using Constraint Programming, we propose efficient methods to relax threshold constraints as well as constraints involved in patterns such as the top-k patterns and the skypatterns. We show the relevance and the efficiency of our approach through a case study in chemoinformatics for discovering toxicophores.

    References

    [1]
    Bajorath, J., & Auer, J. (2006). Emerging chemical patterns: a new methodology for molecular classification and compound selection. Journal of Chemical Information and Modeling, 46, 2502-2514.
    [2]
    Bistarelli, S., & Bonchi, F. (2007). Soft constraint based pattern mining. Data and Knowledge Engineering, 62(1), 118-137.
    [3]
    Börzönyi, S., Kossmann, D., Stocker, K. (2001). The skyline operator. In Proceedings of the 17th International Conference on Data Engineering (ICDE'01) (pp. 421-430). Springer: IEEE Computer Science.
    [4]
    De Raedt, L., Guns, T., Nijssen, S. (2008). Constraint programming for itemset mining. In KDD'08 (pp. 204-212). ACM.
    [5]
    De Raedt, L., & Zimmermann, A. (2007). Constraint-based pattern set mining. In Proceedings of the 7th SIAM international conference on data mining. Minneapolis, MN: SIAM.
    [6]
    Garofalakis, M.N., Rastogi, R., Shim, K. (1999). SPIRIT: Sequential pattern mining with regular expression constraints. In Proceedings of 25th international conference on very large data bases (pp. 223-234).
    [7]
    Gavanelli, M. (2002). An algorithm for multi-criteria optimization in csps. In F. van Harmelen (Ed.), ECAI(pp. 136-140). IOS Press.
    [8]
    Guns, T., Nijssen, S., De Raedt, L. (2011). Itemset mining: a constraint programming perspective. Artificial Intelligence, 175(12-13), 1951-1983.
    [9]
    Hüllermeier, E. (2005). Fuzzy methods in machine learning and data mining: status and prospects. Fuzzy Sets and Systems, 156(3), 387-406.
    [10]
    Jin, W., Han, J., Ester, M. (2004). Mining thick skylines over large databases. In PKDD'04 (pp. 255-266).
    [11]
    Ke, Y., Cheng, J., Yu, J.X. (2009). Top-k correlative graph mining. In SDM (pp. 1038-1049).
    [12]
    Khiari, M., Boizumault, P., Crémilleux, B. (2010). Constraint programming formining n-ary patterns. In CP'10. LNCS (Vol. 6308, pp. 552-567). Springer.
    [13]
    Kung, H.T., Luccio, F., Preparata, F.P. (1975). On finding the maxima of a set of vectors. Journal of the ACM, 22(4), 469-476.
    [14]
    Lin, X., Yuan, Y., Zhang, Q., Zhang, Y. (2007). Selecting stars: The k most representative skyline operator. In ICDE 2007 (pp. 86-95). IEEE Computer Society Press.
    [15]
    Mannila, H., & Toivonen, H. (1997). Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery, 1(3), 241-258.
    [16]
    Matousek, J. (1991). Computing dominances in e. Information Processing Letter, 38(5), 277-278.
    [17]
    Ng, R.T., Lakshmanan, V.S., Han, J., Pang, A. (1998). Exploratory mining and pruning optimizations of constrained associations rules. In Proceedings of ACM SIGMOD'98 (pp. 13-24). ACM Press.
    [18]
    Novak, P.K., Lavrac, N., Webb, G.I. (2009). Supervised descriptive rule discovery: a unifying survey of contrast set, emerging pattern and subgroup mining. Journal of Machine Learning Research, 10, 377-403.
    [19]
    Papadias, D., Tao, Y., Fu, G., Seeger, B. (2005). Progressive skyline computation in database systems. ACM Transactions on Database Systems, 30(1), 41-82.
    [20]
    Papadias, D., Yiu, M.L., Mamoulis, N., Tao, Y. (2008). Nearest neighbor queries in network databases. In Encyclopedia of GIS (pp. 772-776).
    [21]
    Petit, T., Régin, J., Bessière, C., Puget, J. (2000). An original constraint based approach for solving over constrained problems. In CP'2000. LNCS (Vol. 1894, pp. 543-548). Springer.
    [22]
    Poezevara, G., Cuissart, B., Crémilleux, B. (2011). Extracting and summarizing the frequent emerging graph patterns from a dataset of graphs. Journal of Intelligent Information System, 37(3), 333-353.
    [23]
    Soulet, A., Raïssi, C., Plantevit, M., Crémilleux, B. (2011). Mining dominant patterns in the sky. In 11th IEEE Int. Conf. on Data Mining series (ICDM 2011) (pp. 655-664).
    [24]
    Steuer, R.E. (1992). Multiple criteria optimization: Theory, computation and application. Radio e Svyaz, Moscow (504 pp) (in Russian).
    [25]
    Tan, K.L., Eng, P.K., Ooi, B.C. (2001). Efficient progressive skyline computation. In VLDB (pp. 301-310).
    [26]
    Ugarte, W., Boizumault, P., Loudni, S., Crémilleux, B. (2012). Soft threshold constraints for pattern mining. In J.G. Ganascia, P. Lenca, J.M. Petit (Eds.), Discovery science. Lecture notes in computer science (Vol. 7569, pp. 313-327). Springer.
    [27]
    Wang, J., Han, J., Lu, Y., Tzvetkov, P. (2005). Tfp: an efficient algorithm for mining top-k frequent closed itemsets. IEEE Transactions on Knowledge and Data Engineering, 17(5), 652-664.

    Cited By

    View all
    • (2021)Query by Humming for Song Identification Using Voice IsolationAdvances and Trends in Artificial Intelligence. From Theory to Practice10.1007/978-3-030-79463-7_27(323-334)Online publication date: 26-Jul-2021

    Index Terms

    1. Soft constraints for pattern mining
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Journal of Intelligent Information Systems
      Journal of Intelligent Information Systems  Volume 44, Issue 2
      April 2015
      115 pages

      Publisher

      Kluwer Academic Publishers

      United States

      Publication History

      Published: 01 April 2015

      Author Tags

      1. Chemoinformatics
      2. Constraint Programming
      3. Constraint-based pattern mining
      4. Disjonctive relaxation
      5. Soft constraints
      6. Soft skypatterns

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 12 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2021)Query by Humming for Song Identification Using Voice IsolationAdvances and Trends in Artificial Intelligence. From Theory to Practice10.1007/978-3-030-79463-7_27(323-334)Online publication date: 26-Jul-2021

      View Options

      View options

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media