Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Free access

An effective hash-based algorithm for mining association rules

Published: 22 May 1995 Publication History

Abstract

In this paper, we examine the issue of mining association rules among items in a large database of sales transactions. The mining of association rules can be mapped into the problem of discovering large itemsets where a large itemset is a group of items which appear in a sufficient number of transactions. The problem of discovering large itemsets can be solved by constructing a candidate set of itemsets first and then, identifying, within this candidate set, those itemsets that meet the large itemset requirement. Generally this is done iteratively for each large k-itemset in increasing order of k where a large k-itemset is a large itemset with k items. To determine large itemsets from a huge number of candidate large itemsets in early iterations is usually the dominating factor for the overall data mining performance. To address this issue, we propose an effective hash-based algorithm for the candidate set generation. Explicitly, the number of candidate 2-itemsets generated by the proposed algorithm is, in orders of magnitude, smaller than that by previous methods, thus resolving the performance bottleneck. Note that the generation of smaller candidate sets enables us to effectively trim the transaction database size at a much earlier stage of the iterations, thereby reducing the computational cost for later iterations significantly. Extensive simulation study is conducted to evaluate performance of the proposed algorithm.

References

[1]
R. Agrawal, C. Faloutsos, and A. Swami. Efficient Similarity Search in Sequence Databases. Proceedings of the ~th Intl. conf. on Foundations of Data Organization and Algorithms, October, 1993.
[2]
l~. Agrawal, S. Ghosh, T. Imiellnskl, B. Iyer, and A. Swami. An Interval Classifier for Database Mining Applications. Proceedings of the 18th International Conference on Very Large Data Bases, pages 560-573, August 1992.
[3]
R. Agrawal, T. Imielinski, and A. Swami. Mining Association Rules between Sets of Items in Large Databases. Proceedings of A CM SIGMOD, pages 207-216, May 1993.
[4]
R. Agrawal and R. Srikant. Mining Sequential Patterns. Proceedings of the 11th International Conference on Data Engineering, March 1995.
[5]
R. Agrawal and S. Srikant. Fast Algorithms for Mining Association Rules in Large Databases. Proceedings of the 20th International Conference on Very Large Data Bases, September 1994.
[6]
T.M. Anwar, H.W. Beck, and S.B. Navathe. Knowledge Mining by Imprecise Querying: A Classification-Based Approach. Proceedings of ~he 8th International Conference on Daia Engineering, February 1992.
[7]
J. Han, Y. Cai, and N. Cercone. Knowledge Discovery in Databases: An Attribute-Oriented Approach. Proceedings of ~he 18th International Conference on Very Large Da~a Bases, pages 547- 559, August 1992.
[8]
M. Houtsma and A. Swami. Set-Oriented Mining of Association Rules. Technical Report RJ 9567, IBM Almaden Research Laboratory, San Jose, CA, October 1993.
[9]
E. G. Coffman jr. and J. Eve. File structures using hashing functions. Comm. of the ACM, 13(7):427- 432, 436, July 1970.
[10]
R.T. Ng and J. Han. Efficient and Effective Clustering Methods for Spatial Data Mining. Proceedings of ~he 18th International Conference on Very Large Data Bases, pages 144-155, September 1994.
[11]
G. Piatetsky-Shapiro. Discovery, Analysis and Presentation of Strong Rules. Knowledge Discovery in Databases, 1991.
[12]
J.R. Quinlan. Induction of Decision Trees. Machine Learning, 1:81-106, 1986.

Cited By

View all
  • (2024)Research and Application of an Improved Sparrow Search AlgorithmApplied Sciences10.3390/app1408346014:8(3460)Online publication date: 19-Apr-2024
  • (2024)Improvement of Apriori Algorithm Using Parallelization Technique on Multi-CPU and GPU TopologyWireless Communications & Mobile Computing10.1155/2024/77169762024Online publication date: 1-Jan-2024
  • (2024)Efficient high utility itemset mining without the join operationInformation Sciences10.1016/j.ins.2024.121218681(121218)Online publication date: Oct-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGMOD Record
ACM SIGMOD Record  Volume 24, Issue 2
May 1995
490 pages
ISSN:0163-5808
DOI:10.1145/568271
Issue’s Table of Contents
  • cover image ACM Conferences
    SIGMOD '95: Proceedings of the 1995 ACM SIGMOD international conference on Management of data
    June 1995
    508 pages
    ISBN:0897917316
    DOI:10.1145/223784
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 May 1995
Published in SIGMOD Volume 24, Issue 2

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)470
  • Downloads (Last 6 weeks)52
Reflects downloads up to 11 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Research and Application of an Improved Sparrow Search AlgorithmApplied Sciences10.3390/app1408346014:8(3460)Online publication date: 19-Apr-2024
  • (2024)Improvement of Apriori Algorithm Using Parallelization Technique on Multi-CPU and GPU TopologyWireless Communications & Mobile Computing10.1155/2024/77169762024Online publication date: 1-Jan-2024
  • (2024)Efficient high utility itemset mining without the join operationInformation Sciences10.1016/j.ins.2024.121218681(121218)Online publication date: Oct-2024
  • (2024)GMiner++: Boosting GPU-based frequent itemset mining by reducing redundant computationsExpert Systems with Applications10.1016/j.eswa.2024.123928250(123928)Online publication date: Sep-2024
  • (2024)Data Mining: Mining Frequent Patterns, Associations Rules, and CorrelationsReference Module in Life Sciences10.1016/B978-0-323-95502-7.00031-2Online publication date: 2024
  • (2024)Efficient Top-k Frequent Itemset Mining on Massive DataData Science and Engineering10.1007/s41019-024-00241-29:2(177-203)Online publication date: 6-Feb-2024
  • (2024)A new evolutionary optimization based on multi-objective firefly algorithm for mining numerical association rulesSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-023-09558-y28:9-10(6879-6892)Online publication date: 1-May-2024
  • (2024)Association Analysis: Basic Concepts and AlgorithmsAssociation Analysis Techniques and Applications in Bioinformatics10.1007/978-981-99-8251-6_2(9-53)Online publication date: 26-Apr-2024
  • (2024)Mining Association of Outliers in Time SeriesRecent Advancements in Tourism Business, Technology and Social Sciences10.1007/978-3-031-54342-5_26(433-444)Online publication date: 28-Apr-2024
  • (2023)A New Approach for Optimizing the Extraction of Association RulesEngineering, Technology & Applied Science Research10.48084/etasr.572213:2(10496-10500)Online publication date: 2-Apr-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media