Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Self-sufficient itemsets: An approach to screening potentially interesting associations between items

Published: 18 January 2010 Publication History

Abstract

Self-sufficient itemsets are those whose frequency cannot be explained solely by the frequency of either their subsets or of their supersets. We argue that itemsets that are not self-sufficient will often be of little interest to the data analyst, as their frequency should be expected once that of the itemsets on which their frequency depends is known. We present tests for statistically sound discovery of self-sufficient itemsets, and computational techniques that allow those tests to be applied as a post-processing step for any itemset discovery algorithm. We also present a measure for assessing the degree of potential interest in an itemset that complements these statistical measures.

References

[1]
Agrawal, R., Imielinski, T., and Swami, A. 1993. Mining associations between sets of items in massive databases. In Proceedings of the ACM-SIGMOD International Conference on Management of Data. 207--216.
[2]
Agresti, A. 1992. A survey of exact inference for contingency tables. Statis. Sci. 7, 1, 131--153.
[3]
Agresti, A. 2002. Categorical Data Analysis. Wiley-Interscience, New York.
[4]
Aumann, Y. and Lindell, Y. 1999. A statistical theory for quantitative association rules. In Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'99). 261--270.
[5]
Bastide, Y., Pasquier, N., Taouil, R., Stumme, G., and Lakhal, L. 2000. Mining minimal non-redundant association rules using frequent closed itemsets. In Proceedings of the 1st International Conference on Computational Logic (CL'00). Springer-Verlag, Berlin, 972--986.
[6]
Bayardo, Jr., R. J. and Agrawal, R. 1999. Mining the most interesting rules. In Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'99). 145--154.
[7]
Bayardo, Jr., R. J., Agrawal, R., and Gunopulos, D. 2000. Constraint-based rule mining in large, dense databases. Data Min. Knowl. Disc. 4, 2/3, 217--240.
[8]
Brijs, T., Swinnen, G., Vanhoof, K., and Wets, G. 1999. Using association rules for product assortment decisions: A case study. In Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM Press, New York, NY, 254--260.
[9]
Calders, T. and Goethals, B. 2002. Mining all non-derivable frequent itemsets. In Proceedings of the 6th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD'02). Springer, Berlin, 74--85.
[10]
Chan, R., Yang, Q., and Shen, Y. D. 2003. Mining high utility itemsets. In Proceedings of the 3rd IEEE International Conference on Data Mining. 19--26.
[11]
Cheng, J., Ke, Y., and Ng, W. 2006. Δ-tolerance closed frequent itemsets. In Proceedings of the 6th International Conference on Data Mining. 139--148.
[12]
Cooley, R., Tan, P.-N., and Srivastava, J. 1999. Discovery of interesting usage patterns from Web data. In Proceedings of the International WEBKDD'99 Workshop. Springer, Berlin, 163--182.
[13]
DuMouchel, W. 1999. Bayesian data mining in large frequency tables, with an application to the FDA spontaneous reporting system. Americ. Statis. 53, 3, 177--190.
[14]
DuMouchel, W. and Pregibon, D. 2001. Empirical Bayes screening for multi-item associations. In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'01). ACM Press, New York, NY, 76.
[15]
Goethals, B. 2007. NDI. Software. http://www.adrem.ua.ac.be/goethals/software/.
[16]
Hettich, S. and Bay, S. D. 2006. The UCI KDD archive. Department of Information and Computer Science. University of California, Irvine, CA. http://kdd.ics.uci.edu.
[17]
Holm, S. 1979. A simple sequentially rejective multiple test procedure. Scandinavian J. Statis. 6, 65--70.
[18]
Jaroszewicz, S. and Simovici, D. A. 2004. Interestingness of frequent itemsets using Bayesian networks as background knowledge. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'04). R. Kohavi, J. Gehrke, and J. Ghosh, Eds. ACM Press, New York, NY, 178--186.
[19]
Liu, B., Hsu, W., and Ma, Y. 2001. Identifying non-actionable association rules. In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'01). 329--334.
[20]
Malik, H. H. and Kender, J. R. 2006. High quality, efficient hierarchical document clustering using closed interesting itemsets. In Proceedings of the 6th IEEE International Conference on Data Mining. IEEE, 991--996.
[21]
Megiddo, N. and Srikant, R. 1998. Discovering predictive association rules. In Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (KDD'98). AAAI Press, Menlo Park, US, 27--78.
[22]
Newman, D. J., Hettich, S., Blake, C., and Merz, C. J. 2006. UCI repository of machine learning databases. {Machine-readable data repository}. Department of Information and Computer Science, University of California, Irvine, CA.
[23]
Pei, J., Dong, G., Zou, W., and Han, J. 2002. On computing condensed frequent pattern bases. In Proceedings of the 2nd IEEE International Conference on Data Mining (ICDM'02). 378--385.
[24]
Piatetsky-Shapiro, G. 1991. Discovery, analysis, and presentation of strong rules. In Knowledge Discovery in Databases, G. Piatetsky-Shapiro and J. Frawley, Eds. AAAI/MIT Press, Menlo Park, CA., 229--248.
[25]
Shaffer, J. P. 1995. Multiple hypothesis testing. Annual Rev. Psych. 46, 561--584.
[26]
Tabachnick, B. G. and Fidell, L. S. 2001. Using Multivariate Statistics. Allyn and Bacon, Boston, MA.
[27]
Webb, G. I. 1995. OPUS: An efficient admissible algorithm for unordered search. J. Arti. Intell. Res. 3, 431--465.
[28]
Webb, G. I. 2001. Discovering associations with numeric variables. In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'01). The ACM, New York, NY, 383--388.
[29]
Webb, G. I. 2007. Discovering significant patterns. Mach. Learn. 68, 1, 1--33.
[30]
Webb, G. I. 2009. Magnum Opus Version 4.3. Software, G. I. Webb & Associates, Melbourne, Aust.
[31]
Webb, G. I. and Zhang, S. 2005. K-optimal rule discovery. Data Mining Knowl. Discov. 10, 1, 39--79.
[32]
Wu, X., Barbará, D., and Ye, Y. 2003. Screening and interpreting multi-item associations based on log-linear modeling. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 276--285.
[33]
Xin, D., Han, J., Yan, X., and Cheng, H. 2005. Mining compressed frequent-pattern sets. In Proceedings of the International Conference on Very Large Databases (VLDB'05). 709--720.
[34]
Yao, H. and Hamilton, H. J. 2006. Mining itemset utilities from transaction databases. Data Knowl. Engin. 59, 603--626.
[35]
Zaki, M. J. 2000. Generating non-redundant association rules. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'00). ACM, New York, NY, 34--43.
[36]
Zaki, M. J. 2004. Mining non-redundant association rules. Data Mining Knowl. Discov. 9, 3, 223--248.
[37]
Zaki, M. J. and Hsiao, C. J. 2002. CHARM: An efficient algorithm for closed itemset mining. In Proceedings of the 2nd SIAM International Conference on Data Mining. 457--473.
[38]
Zhang, H., Padmanabhan, B., and Tuzhilin, A. 2004. On the discovery of significant statistical quantitative rules. In Proceedings of the 10th International Conference on Knowledge Discovery and Data Mining (KDD'04). ACM Press, New York, NY, 374--383.
[39]
Zheng, Z., Kohavi, R., and Mason, L. 2001. Real world performance of association rule algorithms. In Proceedings of the 7th International Conference on Knowledge Discovery and Data Mining (KDD'01). ACM, New York, NY, 401--406.

Cited By

View all
  • (2022)Associative Classifier Coupled With Unsupervised Feature Reduction for Dengue Fever Classification Using Gene Expression DataIEEE Access10.1109/ACCESS.2022.319893710(88340-88353)Online publication date: 2022
  • (2022)Selecting Outstanding Patterns Based on Their NeighbourhoodAdvances in Intelligent Data Analysis XX10.1007/978-3-031-01333-1_15(185-198)Online publication date: 7-Apr-2022
  • (2021)Efficient Mining of Non-Redundant Periodic Frequent PatternsVietnam Journal of Computer Science10.1142/S219688882150021408:04(455-469)Online publication date: 7-Jan-2021
  • Show More Cited By

Index Terms

  1. Self-sufficient itemsets: An approach to screening potentially interesting associations between items

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Knowledge Discovery from Data
    ACM Transactions on Knowledge Discovery from Data  Volume 4, Issue 1
    January 2010
    135 pages
    ISSN:1556-4681
    EISSN:1556-472X
    DOI:10.1145/1644873
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 January 2010
    Accepted: 01 May 2009
    Revised: 01 March 2009
    Received: 01 March 2008
    Published in TKDD Volume 4, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Association discovery
    2. association rules
    3. itemset discovery
    4. itemset screening
    5. statistical evaluation

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)12
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 25 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Associative Classifier Coupled With Unsupervised Feature Reduction for Dengue Fever Classification Using Gene Expression DataIEEE Access10.1109/ACCESS.2022.319893710(88340-88353)Online publication date: 2022
    • (2022)Selecting Outstanding Patterns Based on Their NeighbourhoodAdvances in Intelligent Data Analysis XX10.1007/978-3-031-01333-1_15(185-198)Online publication date: 7-Apr-2022
    • (2021)Efficient Mining of Non-Redundant Periodic Frequent PatternsVietnam Journal of Computer Science10.1142/S219688882150021408:04(455-469)Online publication date: 7-Jan-2021
    • (2021)Surprisingness - A Novel Objective Interestingness Measure in Hypergraph Pattern Mining from Knowledge Graphs for Common Sense Learning2021 IEEE International Conference on Big Knowledge (ICBK)10.1109/ICKG52313.2021.00017(54-61)Online publication date: Dec-2021
    • (2021)Using the internet of things in smart energy systems and networksSustainable Cities and Society10.1016/j.scs.2021.10278368(102783)Online publication date: May-2021
    • (2020)Discovering Succinct Pattern Sets Expressing Co-Occurrence and Mutual ExclusivityProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3394486.3403124(813-823)Online publication date: 23-Aug-2020
    • (2020)ARCAMETESProceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops10.1145/3387940.3391482(396-403)Online publication date: 27-Jun-2020
    • (2020)Program Slicing and Execution Tracing for Differential Testing at Adobe AnalyticsProceedings of the 28th International Conference on Program Comprehension10.1145/3387904.3389292(411-415)Online publication date: 13-Jul-2020
    • (2020)Mining Non-redundant Periodic Frequent PatternsIntelligent Information and Database Systems10.1007/978-3-030-41964-6_28(321-331)Online publication date: 4-Mar-2020
    • (2019)Context-Enriched Regular Human Behavioral Pattern Detection From Body Sensors DataIEEE Access10.1109/ACCESS.2019.29041227(33834-33850)Online publication date: 2019
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media