Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Formal and computational properties of the confidence boost of association rules

Published: 25 December 2013 Publication History

Abstract

Some existing notions of redundancy among association rules allow for a logical-style characterization and lead to irredundant bases of absolutely minimum size. We push the intuition of redundancy further to find an intuitive notion of novelty of an association rule, with respect to other rules. Namely, an irredundant rule is so because its confidence is higher than what the rest of the rules would suggest; then, one can ask: how much higher?
We propose to measure such a sort of novelty through the confidence boost of a rule. Acting as a complement to confidence and support, the confidence boost helps to obtain small and crisp sets of mined association rules and solves the well-known problem that, in certain cases, rules of negative correlation may pass the confidence bound. We analyze the properties of two versions of the notion of confidence boost, one of them a natural generalization of the other. We develop algorithms to filter rules according to their confidence boost, compare the concept to some similar notions in the literature, and describe the results of some experimentation employing the new notions on standard benchmark datasets. We describe an open source association mining tool that embodies one of our variants of confidence boost in such a way that the data mining process does not require the user to select any value for any parameter.

References

[1]
Aggarwal, C. C. and Yu, P. S. 2001. A new approach to online generation of association rules. IEEE Trans. Knowl. Data Eng. 13, 4, 527--540.
[2]
Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., and Verkamo, A. I. 1996. Fast discovery of association rules. In Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press, 307--328.
[3]
Baixeries, J., Szathmary, L., Valtchev, P., and Godin, R. 2009. Yet a faster algorithm for building the Hasse diagram of a concept lattice. In Proceedings of the 7th International Conference on Formal Concept Analysis (ICFCA), S. Ferré and S. Rudolph, Eds. Lecture Notes in Artificial Intelligence Series, vol. 5548. Springer Verlag, 162--177.
[4]
Balcázar, J. L. 2009. Two measures of objective novelty in association rule mining. In Proceedings of the 13th Pacific-Asia International Conference on Knowledge Discovery and Data Mining: New Frontiers in Applied Data Mining (PAKDD’09). 76--98.
[5]
Balcázar, J. L. 2010a. Closure-based confidence boost in association rules. In Proceedings of the 1st Workshop on Applications of Pattern Analysis. 1--7.
[6]
Balcázar, J. L. 2010b. Objective novelty of association rules: Measuring the confidence boost. In S. B. Yahia and J.-M. Petit, Eds., EGC, Revue des Nouvelles Technologies de l’Information Series, Vol. RNTI-E-19. Cépaduès-Éditions, 297--302.
[7]
Balcázar, J. L. 2010c. Redundancy, deduction schemes, and minimum-size bases for association rules. Log. Meth. Comput. Sci. 6, 2:3, 1--33.
[8]
Balcázar, J. L. 2011. Parameter-free association rule mining with yacaree. In Khenchaf, A. and Poncelet, P. Eds., EGC. Hermann-Editions, 251--254.
[9]
Balcázar, J. L., García-Sáiz, D., and de la Dehesa, J. 2012. Iterator-based algorithms in self-tuning discovery of partial implications. In Contributions to the 10th International Conference on Formal Concept Analysis (ICFCA’12). 14--28.
[10]
Balcázar, J. L., García-Sáiz, D., Gómez-Pérez, D., and Tîrnăucă, C. 2011. Closed-set-based discovery of bases of association rules. In preparation. Available at: http://personales.unican.es/tirnaucac.
[11]
Balcázar, J. L. and Tîrnăucă, C. 2011a. Border algorithms for computing Hasse diagrams of arbitrary lattices. In Proceedings of the 9th International Conference on Formal Concept Analysis (ICFCA’11). Springer, Berlin, 49--64.
[12]
Balcázar, J. L. and Tîrnăucă, C. 2011b. Closed-set-based discovery of representative association rules revisited. In A. Khenchaf and P. Poncelet, Eds., EGC. Hermann-Editions, 635--646.
[13]
Balcázar, J. L., Tîrnăucă, C., and Zorrilla, M. 2010a. Mining educational data for patterns with negations and high confidence boost. In Proceedings of the Symposium on Theory and Applications of Data Mining (TAMIDA’10). 329--338.
[14]
Balcázar, J. L., Tîrnăucă, C., and Zorrilla, M. E. 2010b. Filtering association rules with negations on the basis of their confidence boost. In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval (KDIR’10). 263--268.
[15]
Bayardo, R., Agrawal, R., and Gunopulos, D. 1999. Constraint-based rule mining in large, dense databases. In Proceedings of the 15th International Conference on Data Engineering (ICDE’99). 188--197.
[16]
Borgelt, C. 2003. Efficient implementations of apriori and eclat. In Proceedings of the 1st IEEE ICMD Workshop of Frequent Item Set Mining Implementations (FIMI’03). 1--9
[17]
Boulicaut, J.-F., Bykowski, A., and Rigotti, C. 2003. Free-sets: A condensed representation of boolean data for the approximation of frequency queries. Data Min. Knowl. Discov. 7, 1, 5--22.
[18]
Fortelius, M. 2003. Neogene of the old world database of fossil mammals (NOW). University of Helsinki, http://www.helsinki.fi/science/now.
[19]
Frank, A. and Asuncion, A. 2010. UCI machine learning repository, http://archive.ics.uci.edu/ml.
[20]
Gallo, A., De Bie, T., and Cristianini, N. 2007. Mini: Mining informative non-redundant itemsets. In Proceedings of the 11th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKKD’07). Springer, Berlin, 438--445.
[21]
Geng, L. and Hamilton, H. J. 2006. Interestingness measures for data mining: A survey. ACM Comput. Surv. 38, 3.
[22]
Guigues, J. and Duquenne, V. 1986. Familles minimales d’implications informatives résultant d’un tableau de données binaires. Math. Sci. Hum. 95, 5--18.
[23]
Jaroszewicz, S., Scheffer, T., and Simovici, D. A. 2009. Scalable pattern mining with bayesian networks as background knowledge. Data Min. Knowl. Discov. 18, 1, 56--100.
[24]
Jaroszewicz, S. and Simovici, D. 2002. Pruning redundant association rules using maximum entropy principle. In Proceedings of the 6th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’02). Springer-Verlag, Berlin, 135--147.
[25]
Khenchaf, A. and Poncelet, P., Eds. 2011. Actes de Extraction et Gestion des Connaissances (EGC). Revue des Nouvelles Technologies de l’Information Series, vol. E.20. Hermann.
[26]
Kryszkiewicz, M. 1998a. Fast discovery of representative association rules. In Proceedings of the 1st International Conference on Rough Sets and Current Trends in Computing (RSCTC’98). Springer-Verlag, 214--221.
[27]
Kryszkiewicz, M. 1998b. Representative association rules. In Proceedings of the 2nd Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’98). Springer-Verlag, Berlin, 198--209.
[28]
Kryszkiewicz, M. 1998c. Representative association rules and minimum condition maximum consequence association rules. In Proceedings of the 2nd European Symposium on Principles of Data Mining and Knowledge Discovery. Springer-Verlag, London, UK, 361--369.
[29]
Kryszkiewicz, M. 2001. Closed set based discovery of representative association rules. In Proceedings of the 4th International Conference on Intelligent Data Analysis (IDA’01). Springer-Verlag, London, UK, 350--359.
[30]
Kryszkiewicz, M. 2002. Concise representations of association rules. In Proceedings of the ESF Exploratory Workshop on Pattern Detection and Discovery. Springer-Verlag, London, UK, 92--109.
[31]
Lenca, P., Meyer, P., Vaillant, B. and Lallich, S. 2008. On selecting interestingness measures for association rules: User oriented description and multiple criteria decision aid. Eur. J. Oper. Res. 184, 2, 610--626.
[32]
Liu, B., Hsu, W., and Ma, Y. 1999. Pruning and summarizing the discovered associations. In Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’99). 125--134.
[33]
Luxenburger, M. 1991. Implications partielles dans un contexte. Math. Sci. Hum. 29, 35--55.
[34]
Megiddo, N. and Srikant, R. 1998. Discovering predictive association rules. In Proceedings of the 4th International Conference on Knowledge Discovery in Databases and Data Mining. 274--278.
[35]
Padmanabhan, B. and Tuzhilin, A. 2000. Small is beautiful: Discovering the minimal set of unexpected patterns. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’00). 54--63.
[36]
Pasquier, N., Taouil, R., Bastide, Y., Stumme, G., and Lakhal, L. 2005. Generating a condensed representation for association rules. J. Intell. Inf. Syst. 24, 1, 29--60.
[37]
Phan-Luong, V. 2001. The representative basis for association rules. In Proceedings of the IEEE International Conference on Data Mining (ICDM’01). 639--640.
[38]
Piatetsky-Shapiro, G. 1991. Discovery, analysis, and presentation of strong rules. In G. Piatetsky-Shapiro and W. J. Frawley, Eds., Knowledge Discovery in Databases. AAAI Press, Menlo Park, CA, 229--248.
[39]
Scheffer, T. 2005. Finding association rules that trade support optimally against confidence. Intell. Data Anal. 9, 293--313.
[40]
Shah, D., Lakshmanan, L., Ramamritham, K., and Sudarshan, S. 1999. Interestingness and pruning of mined patterns. In Proceedings of the ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD’99).
[41]
Silverstein, C., Brin, S., and Motwani, R. 1998. Beyond market baskets: Generalizing association rules to dependence rules. Data Min. Knowl. Discov. 2, 1, 39--68.
[42]
Suzuki, E. 1997. Autonomous discovery of reliable exception rules. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD). 275--278.
[43]
Suzuki, E. and Kodratoff, Y. 1998. Discovery of surprising exception rules based on intensity of implication. In Proceedings of the 2nd European Symposium on Principles of Data Mining and Knowledge Discovery. 10--18.
[44]
Tan, P.-N., Kumar, V., and Srivastava, J. 2004. Selecting the right objective measure for association analysis. Inform. Syst. 29, 4, 293--313.
[45]
Toivonen, H., Klemettinen, M., Ronkainen, P., Hätönen, K., and Mannila, H. 1995. Pruning and grouping discovered association rules. In Proceedings of the ECML-95 Workshop on Statistics, Machine Learning, and Knowledge Discovery in Databases. 47--52.
[46]
Vreeken, J., van Leeuwen, M., and Siebes, A. 2011. Krimp: Mining itemsets that compress. Data Min. Knowl. Discov. 23, 1, 169--214.
[47]
Webb, G. I. 2007. Discovering significant patterns. Mach. Learn. 68, 1, 1--33.
[48]
Webb, G. I. 2010. Self-sufficient itemsets: An approach to screening potentially interesting associations between items. ACM Trans. Knowl. Discov. Data. 4, 1.
[49]
Witten, I. H. and Frank, E. 2005. Data Mining: Practical Machine Learning Tools and Techniques 2nd Ed., Morgan Kaufmann.
[50]
Zaki, M. J. 2004. Mining non-redundant association rules. Data Min. Knowl. Discov. 9, 3, 223--248.
[51]
Zaki, M. J. and Hsiao, C.-J. 2005. Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans. Knowl. Data Eng. 17, 4, 462--478.
[52]
Zorrilla, M. E., García-Sáiz, D., and Balcázar, J. L. 2011. Towards parameter-free data mining: Mining educational data with yacaree. In Proceedings of the 4th International Conference on Educational Data Mining. 363--364.
[53]
Zytkow, J. M. and Quafafou, M., Eds. 1998. Proceedings of the 2nd European Symposium on the Principles of Data Mining and Knowledge Discovery (PKDD’98). Lecture Notes in Computer Science Series, vol. 1510. Springer.

Cited By

View all

Index Terms

  1. Formal and computational properties of the confidence boost of association rules

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Knowledge Discovery from Data
    ACM Transactions on Knowledge Discovery from Data  Volume 7, Issue 4
    November 2013
    162 pages
    ISSN:1556-4681
    EISSN:1556-472X
    DOI:10.1145/2541268
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 December 2013
    Revised: 01 September 2013
    Accepted: 01 April 2013
    Received: 01 April 2013
    Published in TKDD Volume 7, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Association rule mining
    2. association rule quality
    3. confidence

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)9
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 17 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)CauRulerComputers in Biology and Medicine10.1016/j.compbiomed.2023.106636155:COnline publication date: 1-Mar-2023
    • (2017)Analytics using metadata associations for digital investigationsCSI Transactions on ICT10.1007/s40012-017-0174-85:3(315-338)Online publication date: 4-Jul-2017
    • (2015)An efficient approach for mining association rules from high utility itemsetsExpert Systems with Applications: An International Journal10.1016/j.eswa.2015.02.05142:13(5754-5778)Online publication date: 1-Aug-2015
    • (2015)Quantitative Redundancy in Partial ImplicationsFormal Concept Analysis10.1007/978-3-319-19545-2_1(3-20)Online publication date: 27-May-2015
    • (2013)Evaluation of Association Rule Quality Measures through Feature ExtractionProceedings of the 12th International Symposium on Advances in Intelligent Data Analysis XII - Volume 820710.1007/978-3-642-41398-8_7(68-79)Online publication date: 17-Oct-2013

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media