research-article

Formal and computational properties of the confidence boost of association rules

Author:

José L. BalcázarAuthors Info & Claims

ACM Transactions on Knowledge Discovery from Data (TKDD), Volume 7, Issue 4

Article No.: 19, Pages 1 - 41

https://doi.org/10.1145/2541268.2541272

Published: 25 December 2013 Publication History

Abstract

Some existing notions of redundancy among association rules allow for a logical-style characterization and lead to irredundant bases of absolutely minimum size. We push the intuition of redundancy further to find an intuitive notion of novelty of an association rule, with respect to other rules. Namely, an irredundant rule is so because its confidence is higher than what the rest of the rules would suggest; then, one can ask: how much higher?

We propose to measure such a sort of novelty through the confidence boost of a rule. Acting as a complement to confidence and support, the confidence boost helps to obtain small and crisp sets of mined association rules and solves the well-known problem that, in certain cases, rules of negative correlation may pass the confidence bound. We analyze the properties of two versions of the notion of confidence boost, one of them a natural generalization of the other. We develop algorithms to filter rules according to their confidence boost, compare the concept to some similar notions in the literature, and describe the results of some experimentation employing the new notions on standard benchmark datasets. We describe an open source association mining tool that embodies one of our variants of confidence boost in such a way that the data mining process does not require the user to select any value for any parameter.

References

[1]

Aggarwal, C. C. and Yu, P. S. 2001. A new approach to online generation of association rules. IEEE Trans. Knowl. Data Eng. 13, 4, 527--540.

Digital Library

[2]

Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., and Verkamo, A. I. 1996. Fast discovery of association rules. In Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press, 307--328.

Digital Library

[3]

Baixeries, J., Szathmary, L., Valtchev, P., and Godin, R. 2009. Yet a faster algorithm for building the Hasse diagram of a concept lattice. In Proceedings of the 7th International Conference on Formal Concept Analysis (ICFCA), S. Ferré and S. Rudolph, Eds. Lecture Notes in Artificial Intelligence Series, vol. 5548. Springer Verlag, 162--177.

Digital Library

[4]

Balcázar, J. L. 2009. Two measures of objective novelty in association rule mining. In Proceedings of the 13th Pacific-Asia International Conference on Knowledge Discovery and Data Mining: New Frontiers in Applied Data Mining (PAKDD’09). 76--98.

Digital Library

[5]

Balcázar, J. L. 2010a. Closure-based confidence boost in association rules. In Proceedings of the 1st Workshop on Applications of Pattern Analysis. 1--7.

[6]

Balcázar, J. L. 2010b. Objective novelty of association rules: Measuring the confidence boost. In S. B. Yahia and J.-M. Petit, Eds., EGC, Revue des Nouvelles Technologies de l’Information Series, Vol. RNTI-E-19. Cépaduès-Éditions, 297--302.

[7]

Balcázar, J. L. 2010c. Redundancy, deduction schemes, and minimum-size bases for association rules. Log. Meth. Comput. Sci. 6, 2:3, 1--33.

[8]

Balcázar, J. L. 2011. Parameter-free association rule mining with yacaree. In Khenchaf, A. and Poncelet, P. Eds., EGC. Hermann-Editions, 251--254.

[9]

Balcázar, J. L., García-Sáiz, D., and de la Dehesa, J. 2012. Iterator-based algorithms in self-tuning discovery of partial implications. In Contributions to the 10th International Conference on Formal Concept Analysis (ICFCA’12). 14--28.

[10]

Balcázar, J. L., García-Sáiz, D., Gómez-Pérez, D., and Tîrnăucă, C. 2011. Closed-set-based discovery of bases of association rules. In preparation. Available at: http://personales.unican.es/tirnaucac.

[11]

Balcázar, J. L. and Tîrnăucă, C. 2011a. Border algorithms for computing Hasse diagrams of arbitrary lattices. In Proceedings of the 9th International Conference on Formal Concept Analysis (ICFCA’11). Springer, Berlin, 49--64.

Digital Library

[12]

Balcázar, J. L. and Tîrnăucă, C. 2011b. Closed-set-based discovery of representative association rules revisited. In A. Khenchaf and P. Poncelet, Eds., EGC. Hermann-Editions, 635--646.

[13]

Balcázar, J. L., Tîrnăucă, C., and Zorrilla, M. 2010a. Mining educational data for patterns with negations and high confidence boost. In Proceedings of the Symposium on Theory and Applications of Data Mining (TAMIDA’10). 329--338.

[14]

Balcázar, J. L., Tîrnăucă, C., and Zorrilla, M. E. 2010b. Filtering association rules with negations on the basis of their confidence boost. In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval (KDIR’10). 263--268.

[15]

Bayardo, R., Agrawal, R., and Gunopulos, D. 1999. Constraint-based rule mining in large, dense databases. In Proceedings of the 15th International Conference on Data Engineering (ICDE’99). 188--197.

Digital Library

[16]

Borgelt, C. 2003. Efficient implementations of apriori and eclat. In Proceedings of the 1st IEEE ICMD Workshop of Frequent Item Set Mining Implementations (FIMI’03). 1--9

[17]

Boulicaut, J.-F., Bykowski, A., and Rigotti, C. 2003. Free-sets: A condensed representation of boolean data for the approximation of frequency queries. Data Min. Knowl. Discov. 7, 1, 5--22.

Digital Library

[18]

Fortelius, M. 2003. Neogene of the old world database of fossil mammals (NOW). University of Helsinki, http://www.helsinki.fi/science/now.

[19]

Frank, A. and Asuncion, A. 2010. UCI machine learning repository, http://archive.ics.uci.edu/ml.

[20]

Gallo, A., De Bie, T., and Cristianini, N. 2007. Mini: Mining informative non-redundant itemsets. In Proceedings of the 11th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKKD’07). Springer, Berlin, 438--445.

Digital Library

[21]

Geng, L. and Hamilton, H. J. 2006. Interestingness measures for data mining: A survey. ACM Comput. Surv. 38, 3.

Digital Library

[22]

Guigues, J. and Duquenne, V. 1986. Familles minimales d’implications informatives résultant d’un tableau de données binaires. Math. Sci. Hum. 95, 5--18.

[23]

Jaroszewicz, S., Scheffer, T., and Simovici, D. A. 2009. Scalable pattern mining with bayesian networks as background knowledge. Data Min. Knowl. Discov. 18, 1, 56--100.

Digital Library

[24]

Jaroszewicz, S. and Simovici, D. 2002. Pruning redundant association rules using maximum entropy principle. In Proceedings of the 6th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’02). Springer-Verlag, Berlin, 135--147.

Digital Library

[25]

Khenchaf, A. and Poncelet, P., Eds. 2011. Actes de Extraction et Gestion des Connaissances (EGC). Revue des Nouvelles Technologies de l’Information Series, vol. E.20. Hermann.

[26]

Kryszkiewicz, M. 1998a. Fast discovery of representative association rules. In Proceedings of the 1st International Conference on Rough Sets and Current Trends in Computing (RSCTC’98). Springer-Verlag, 214--221.

Digital Library

[27]

Kryszkiewicz, M. 1998b. Representative association rules. In Proceedings of the 2nd Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’98). Springer-Verlag, Berlin, 198--209.

Digital Library

[28]

Kryszkiewicz, M. 1998c. Representative association rules and minimum condition maximum consequence association rules. In Proceedings of the 2nd European Symposium on Principles of Data Mining and Knowledge Discovery. Springer-Verlag, London, UK, 361--369.

Digital Library

[29]

Kryszkiewicz, M. 2001. Closed set based discovery of representative association rules. In Proceedings of the 4th International Conference on Intelligent Data Analysis (IDA’01). Springer-Verlag, London, UK, 350--359.

Digital Library

[30]

Kryszkiewicz, M. 2002. Concise representations of association rules. In Proceedings of the ESF Exploratory Workshop on Pattern Detection and Discovery. Springer-Verlag, London, UK, 92--109.

Digital Library

[31]

Lenca, P., Meyer, P., Vaillant, B. and Lallich, S. 2008. On selecting interestingness measures for association rules: User oriented description and multiple criteria decision aid. Eur. J. Oper. Res. 184, 2, 610--626.

[32]

Liu, B., Hsu, W., and Ma, Y. 1999. Pruning and summarizing the discovered associations. In Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’99). 125--134.

Digital Library

[33]

Luxenburger, M. 1991. Implications partielles dans un contexte. Math. Sci. Hum. 29, 35--55.

[34]

Megiddo, N. and Srikant, R. 1998. Discovering predictive association rules. In Proceedings of the 4th International Conference on Knowledge Discovery in Databases and Data Mining. 274--278.

[35]

Padmanabhan, B. and Tuzhilin, A. 2000. Small is beautiful: Discovering the minimal set of unexpected patterns. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’00). 54--63.

Digital Library

[36]

Pasquier, N., Taouil, R., Bastide, Y., Stumme, G., and Lakhal, L. 2005. Generating a condensed representation for association rules. J. Intell. Inf. Syst. 24, 1, 29--60.

Digital Library

[37]

Phan-Luong, V. 2001. The representative basis for association rules. In Proceedings of the IEEE International Conference on Data Mining (ICDM’01). 639--640.

Digital Library

[38]

Piatetsky-Shapiro, G. 1991. Discovery, analysis, and presentation of strong rules. In G. Piatetsky-Shapiro and W. J. Frawley, Eds., Knowledge Discovery in Databases. AAAI Press, Menlo Park, CA, 229--248.

[39]

Scheffer, T. 2005. Finding association rules that trade support optimally against confidence. Intell. Data Anal. 9, 293--313.

Digital Library

[40]

Shah, D., Lakshmanan, L., Ramamritham, K., and Sudarshan, S. 1999. Interestingness and pruning of mined patterns. In Proceedings of the ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD’99).

[41]

Silverstein, C., Brin, S., and Motwani, R. 1998. Beyond market baskets: Generalizing association rules to dependence rules. Data Min. Knowl. Discov. 2, 1, 39--68.

Digital Library

[42]

Suzuki, E. 1997. Autonomous discovery of reliable exception rules. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD). 275--278.

[43]

Suzuki, E. and Kodratoff, Y. 1998. Discovery of surprising exception rules based on intensity of implication. In Proceedings of the 2nd European Symposium on Principles of Data Mining and Knowledge Discovery. 10--18.

Digital Library

[44]

Tan, P.-N., Kumar, V., and Srivastava, J. 2004. Selecting the right objective measure for association analysis. Inform. Syst. 29, 4, 293--313.

Digital Library

[45]

Toivonen, H., Klemettinen, M., Ronkainen, P., Hätönen, K., and Mannila, H. 1995. Pruning and grouping discovered association rules. In Proceedings of the ECML-95 Workshop on Statistics, Machine Learning, and Knowledge Discovery in Databases. 47--52.

[46]

Vreeken, J., van Leeuwen, M., and Siebes, A. 2011. Krimp: Mining itemsets that compress. Data Min. Knowl. Discov. 23, 1, 169--214.

Digital Library

[47]

Webb, G. I. 2007. Discovering significant patterns. Mach. Learn. 68, 1, 1--33.

Digital Library

[48]

Webb, G. I. 2010. Self-sufficient itemsets: An approach to screening potentially interesting associations between items. ACM Trans. Knowl. Discov. Data. 4, 1.

Digital Library

[49]

Witten, I. H. and Frank, E. 2005. Data Mining: Practical Machine Learning Tools and Techniques 2nd Ed., Morgan Kaufmann.

Digital Library

[50]

Zaki, M. J. 2004. Mining non-redundant association rules. Data Min. Knowl. Discov. 9, 3, 223--248.

Digital Library

[51]

Zaki, M. J. and Hsiao, C.-J. 2005. Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans. Knowl. Data Eng. 17, 4, 462--478.

Digital Library

[52]

Zorrilla, M. E., García-Sáiz, D., and Balcázar, J. L. 2011. Towards parameter-free data mining: Mining educational data with yacaree. In Proceedings of the 4th International Conference on Educational Data Mining. 363--364.

[53]

Zytkow, J. M. and Quafafou, M., Eds. 1998. Proceedings of the 2nd European Symposium on the Principles of Data Mining and Knowledge Discovery (PKDD’98). Lecture Notes in Computer Science Series, vol. 1510. Springer.

Digital Library

Cited By

Guillamet GSeguí FVidal-Alaball JLópez B(2023)CauRulerComputers in Biology and Medicine10.1016/j.compbiomed.2023.106636155:COnline publication date: 1-Mar-2023
https://dl.acm.org/doi/10.1016/j.compbiomed.2023.106636
Raghavan SRaghavan S(2017)Analytics using metadata associations for digital investigationsCSI Transactions on ICT10.1007/s40012-017-0174-85:3(315-338)Online publication date: 4-Jul-2017
https://doi.org/10.1007/s40012-017-0174-8
Sahoo JDas AGoswami A(2015)An efficient approach for mining association rules from high utility itemsetsExpert Systems with Applications: An International Journal10.1016/j.eswa.2015.02.05142:13(5754-5778)Online publication date: 1-Aug-2015
https://dl.acm.org/doi/10.1016/j.eswa.2015.02.051
Show More Cited By

Index Terms

Formal and computational properties of the confidence boost of association rules
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Reliable representations for association rules

Association rule mining has contributed to many advances in the area of knowledge discovery. However, the quality of the discovered association rules is a big concern and has drawn more and more attention recently. One problem with the quality of the ...
Redundant association rules reduction techniques

To discover hidden correlations, association rule mining methods use two important constraints known as support and confidence. However, mining methods are often unable to find the best value for these constraints: large number of rules when these ...
TCOM, an innovative data structure for mining association rules among infrequent items

Association rule mining is one of the most important areas in data mining, which has received a great deal of attention. The purpose of association rule mining is the discovery of association relationships or correlations among a set of items. In this ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data

ACM Transactions on Knowledge Discovery from Data Volume 7, Issue 4

November 2013

162 pages

ISSN:1556-4681

EISSN:1556-472X

DOI:10.1145/2541268

Issue’s Table of Contents

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 December 2013

Revised: 01 September 2013

Accepted: 01 April 2013

Received: 01 April 2013

Published in TKDD Volume 7, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Ministerio de Ciencia e Innovación
Pascal-2 Network of the European Union

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
399
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)2

Reflects downloads up to 17 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Guillamet GSeguí FVidal-Alaball JLópez B(2023)CauRulerComputers in Biology and Medicine10.1016/j.compbiomed.2023.106636155:COnline publication date: 1-Mar-2023
https://dl.acm.org/doi/10.1016/j.compbiomed.2023.106636
Raghavan SRaghavan S(2017)Analytics using metadata associations for digital investigationsCSI Transactions on ICT10.1007/s40012-017-0174-85:3(315-338)Online publication date: 4-Jul-2017
https://doi.org/10.1007/s40012-017-0174-8
Sahoo JDas AGoswami A(2015)An efficient approach for mining association rules from high utility itemsetsExpert Systems with Applications: An International Journal10.1016/j.eswa.2015.02.05142:13(5754-5778)Online publication date: 1-Aug-2015
https://dl.acm.org/doi/10.1016/j.eswa.2015.02.051
Balcázar J(2015)Quantitative Redundancy in Partial ImplicationsFormal Concept Analysis10.1007/978-3-319-19545-2_1(3-20)Online publication date: 27-May-2015
https://doi.org/10.1007/978-3-319-19545-2_1
Balcázar JDogbey F(2013)Evaluation of Association Rule Quality Measures through Feature ExtractionProceedings of the 12th International Symposium on Advances in Intelligent Data Analysis XII - Volume 820710.1007/978-3-642-41398-8_7(68-79)Online publication date: 17-Oct-2013
https://dl.acm.org/doi/10.1007/978-3-642-41398-8_7

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents