Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Free access

Using quantitative information for efficient association rule generation

Published: 01 December 2000 Publication History

Abstract

The problem of mining association rules in categorical data presented in customer transactions was introduced by Agrawal, Imielinski and Swami [2]. This seminal work gave birth to several investigation efforts [4, 13] resulting in descriptions of how to extend the original concepts and how to increase the performance of the related algorithms.The original problem of mining association rules was formulated as how to find rules of the form set1 → set2. This rule is supposed to denote affinity or correlation among the two sets containing nominal or ordinal data items. More specifically, such an association rule should translate the following meaning: customers that buy the products in set1 also buy the products in set2. Statistical basis is represented in the form of minimum support and confidence measures of these rules with respect to the set of customer transactions.The original problem as proposed by Agrawal et al. [2] was extended in several directions such as adding or replacing the confidence and support by other measures, or filtering the rules during or after generation, or including quantitative attributes. Srikant e Agrawal [16] describe an new approach where quantitative data can be treated as categorical. This is very important since otherwise part of the customer transaction information is discarded. Whenever an extension is proposed it must be checked in terms of its performance. The algorithm efficiency is linked to the size of the database that is amenable to be treated. Therefore it is crucial to have efficient algorithms that enable us to examine and extract valuable decision-making information in the ever larger databases.In this paper we present an algorithm that can be used in the context of several of the extensions provided in the literature but at the same time preserves its performance, as demonstrated in a case study. The approach in our algorithm is to explore multidimensional properties of the data (provided such properties are present), allowing us to combine this additional information in a very efficient pruning phase. This results in a very flexible and efficient algorithm that was used with success in several experiments using categorical and quantitative databases.The paper is organized as follows. In the next section we describe the quantitative association rules and we present an algorithm to generate it. Section 3 presents an optimization of the pruning phase of the Apriori [4] algorithm based on quantitative information associated with the items. Section 4 presents our experimental results for mining four synthetic workloads, followed by some related work in Section 5. Finally we present some conclusions and future work in Section 6.

References

[1]
{1} R. Agrawal, T. Imielinski, and A. Swami. Database mining: A performance perspective. In IEEE Transactions on Knowlegde and Data Engineering, December 1993.
[2]
{2} R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In Proceedings of the ACM SIGMOD, May 1993.
[3]
{3} R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. Verkamo. Fast discovery of association rules. In Advances in Knowledge Discovery and Data Mining.
[4]
{4} R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In The 20th VLDB, September 1994.
[5]
{5} Y. Aumann and Y. Lindell. A statistical theory for quantitative association rules. In Fifth ACM SIGKDD, August 1999.
[6]
{6} R. Bayardo and R. Agrawal. Mining the most interesting rules. In Fifth ACM SIGKDD, August 1999.
[7]
{7} R. Bayardo, R. Agrawal, and D. Gunopulos. Constraint-based rule mining in large, dense databases. In In Proceedings of the 15th Intl. Conf. on Data Engineering, March 1999.
[8]
{8} J. Bentley. Multidimensional binary search trees used for associative searching. In Communications of ACM, September 1975.
[9]
{9} M. Holsheimer, M. Kersten, H. Mannila, and H. Toivonen. A perspective on databases and data mining. In 1st Intl. Conf. on Knowledge Discovery and Data Mining, August 1995.
[10]
{10} M. Houtsma and A. Swami. Set-oriented mining of association rules. Technical Report RJ 9567, IBM Almaden Research Center, October 1993.
[11]
{11} B. Liu, W. Hsu, and Y. Ma. Pruning and summarizing the discovered associations. In Fifth ACM SIGKDD, August 1999.
[12]
{12} R. Miller and Y. Yang. Association rules over interval data. In Proceedings of the ACM SIGMOD , May 1997.
[13]
{13} J. Park, M. Chen, and P. Yu. An effective hash based algorithm for mining associative rules. In Proceedings of the ACM SIGMOD, May 1995.
[14]
{14} B. Pôssas, F. Ruas, W. Meira, and R. Resende. Geração de regras de associação quantitativas. In XIV SBBD, September 1999.
[15]
{15} A. Savasere, E. Omiecinski, and S. Navathe. An efficient algorithm for mining association rules in large databases. In The 21st VLDB, September 1995.
[16]
{16} R. Srikant and R. Agrawal. Mining quantitative association rules in large relational tables. In Proceedings of the ACM SIGMOD, June 1996.
[17]
{17} G. Webb. Opus: An efficient admissible algorithm for unordered search. In Journal of Artificial Intelligence Research, 1995.
[18]
{18} Z. Zhang, Y. Lu, and B. Zhang. An effective partitioning-combining algorithm for discovering quantitative association rules. In First Pacific Asia Conf. on Knowledge Discovery and Datamining, February 1997.

Cited By

View all
  • (2016)Organizing standardized electronic healthcare records data for miningHealth Policy and Technology10.1016/j.hlpt.2016.03.0065:3(226-242)Online publication date: Sep-2016
  • (2002)Unified descriptive language for association rules in data miningComputational intelligence and applications10.5555/989710.775001(227-232)Online publication date: 1-Jan-2002

Index Terms

  1. Using quantitative information for efficient association rule generation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM SIGMOD Record
    ACM SIGMOD Record  Volume 29, Issue 4
    Dec. 2000
    61 pages
    ISSN:0163-5808
    DOI:10.1145/369275
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 December 2000
    Published in SIGMOD Volume 29, Issue 4

    Check for updates

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)69
    • Downloads (Last 6 weeks)10
    Reflects downloads up to 22 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2016)Organizing standardized electronic healthcare records data for miningHealth Policy and Technology10.1016/j.hlpt.2016.03.0065:3(226-242)Online publication date: Sep-2016
    • (2002)Unified descriptive language for association rules in data miningComputational intelligence and applications10.5555/989710.775001(227-232)Online publication date: 1-Jan-2002

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media