article

Free access

Mining frequent patterns without candidate generation

Authors:

Yiwen YinAuthors Info & Claims

ACM SIGMOD Record, Volume 29, Issue 2

Pages 1 - 12

https://doi.org/10.1145/335191.335372

Published: 16 May 2000 Publication History

Abstract

Mining frequent patterns in transaction databases, time-series databases, and many other kinds of databases has been studied popularly in data mining research. Most of the previous studies adopt an Apriori-like candidate set generation-and-test approach. However, candidate set generation is still costly, especially when there exist prolific patterns and/or long patterns.

In this study, we propose a novel frequent pattern tree (FP-tree) structure, which is an extended prefix-tree structure for storing compressed, crucial information about frequent patterns, and develop an efficient FP-tree-based mining method, FP-growth, for mining the complete set of frequent patterns by pattern fragment growth. Efficiency of mining is achieved with three techniques: (1) a large database is compressed into a highly condensed, much smaller data structure, which avoids costly, repeated database scans, (2) our FP-tree-based mining adopts a pattern fragment growth method to avoid the costly generation of a large number of candidate sets, and (3) a partitioning-based, divide-and-conquer method is used to decompose the mining task into a set of smaller tasks for mining confined patterns in conditional databases, which dramatically reduces the search space. Our performance study shows that the FP-growth method is efficient and scalable for mining both long and short frequent patterns, and is about an order of magnitude faster than the Apriori algorithm and also faster than some recently reported new frequent pattern mining methods.

References

[1]

R. Agarwal, C. Aggarwal, and V. V. V. Prasad. Depth-first generation of large itemsets for association rules. IBM Tech. Report RC21538, July 1999.

[2]

R. Agarwal, C. Aggarwal, and V. V. V. Prasad. A tree projection algorithm for generation of frequent itemsets. In J, Parallel and Distributed Computing, 2000.

Digital Library

[3]

R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In VLDB'9#, pp. 487-499.

Digital Library

[4]

R. Agrawal and R. Srikant. Mining sequential patterns. In ICDE'95, pp. 3-14.

Digital Library

[5]

R. J. Bayardo. Efficiently mining long patterns from databases. In SIGMOD'98, pp. 85-93.

Digital Library

[6]

S. Brin, R. Motwani, and C. Silverstein. Beyond market basket: Generalizing association rules to correlations. In SIGMOD'97, pp. 265-276.

Digital Library

[7]

G. Dong and J. Li. Efficient mining of emerging patterns: Discovering trends and differences. In KDD'99, pp. 43-52.

Digital Library

[8]

G. Grahne, L. Lakshmanan, and X. Wang. Efficient mining of constrained correlated sets. In ICDE'00.

[9]

J. Han, G. Dong, and Y. Yin. Efficient mining of partial periodic patterns in time series database. In ICDE'99, pp. 106-115.

Digital Library

[10]

J. Han, J. Pei, and Y. Yin. Mining partial periodicity using frequent pattern trees. In GS Tech, Rep, 99-10, Simon Fraser University, July 1999.

[11]

M. Kamber, J. Han, and J. Y. Chiang. Metaruleguided mining of multi-dimensional association rules using data cubes. In KDD'97, pp. 207-210.

[12]

M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen, and A.I. Verkamo. Finding interesting rules from large sets of discovered association rules. In CIKM'9#, pp. 401-408.

Digital Library

[13]

B. Lent, A. Swami, and J. Widom. Clustering association rules. In ICDE'97, pp. 220-231.

Digital Library

[14]

H. Mannila, H Toivonen, and A. I. Verkamo. Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery, 1:259-289, 1997.

Digital Library

[15]

R. Ng, L. V. S. Lakshmanan, J. Han, and A. Pang. Exploratory mining and pruning optimizations of constrained associations rules. In SIGMOD'98, pp. 13-24.

Digital Library

[16]

J.S. Park, M.S. Chen, and P.S. Yu. An effective hash-based algorithm for mining association rules. In SIGMOD'95, pp. 175-186.

Digital Library

[17]

S. Sarawagi, S. Thomas, and R. Agrawal. Integrating association rule mining with relational database systerns: Alternatives and implications. In SIGMOD'98, pp. 343-354.

Digital Library

[18]

A. Savasere, E. Omiecinski, and S. Navathe. An efficient algorithm for mining association rules in large databases. In VLDB'95, pp. 432-443.

Digital Library

[19]

C. Silverstein, S. Brin, R. Motwani, and J. Ullman. Scalable techniques for mining causal structures. In VLDB'98, pp. 594-605.

Digital Library

[20]

R. Srikant, Q. Vu, and R. Agrawal. Mining association rules with item constraints. In KDD'97, pp. 67-73.

Cited By

Lee KKeikhosrokiani PWong JAsl M(2024)Narrative Threads and Cinematic Connections Using Intelligent Systems to Enhance Movie Recommendations with Market Basket Analysis and Advanced AlgorithmsData-Driven Business Intelligence Systems for Socio-Technical Organizations10.4018/979-8-3693-1210-0.ch013(319-364)Online publication date: 23-Feb-2024
https://doi.org/10.4018/979-8-3693-1210-0.ch013
Alwhbi IZou CAlharbi R(2024)Encrypted Network Traffic Analysis and Classification Utilizing Machine LearningSensors10.3390/s2411350924:11(3509)Online publication date: 29-May-2024
https://doi.org/10.3390/s24113509
Budaraju RJammalamadaka S(2024)Mining Negative Associations from Medical Databases Considering Frequent, Regular, Closed and Maximal PatternsComputers10.3390/computers1301001813:1(18)Online publication date: 8-Jan-2024
https://doi.org/10.3390/computers13010018
Show More Cited By

Index Terms

Mining frequent patterns without candidate generation
1. Information systems
  1. Data management systems
    1. Database design and models
  2. Information systems applications
    1. Data mining

Recommendations

Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach

Mining frequent patterns in transaction databases, time-series databases, and many other kinds of databases has been studied popularly in data mining research. Most of the previous studies adopt an Apriori-like candidate set generation-and-test ...
Mining frequent patterns without candidate generation
SIGMOD '00: Proceedings of the 2000 ACM SIGMOD international conference on Management of data

Mining frequent patterns in transaction databases, time-series databases, and many other kinds of databases has been studied popularly in data mining research. Most of the previous studies adopt an Apriori-like candidate set generation-and-test ...
Mining frequent closed itemsets without candidate generation
ISPA'05: Proceedings of the Third international conference on Parallel and Distributed Processing and Applications

Mining frequent closed itemsets provides complete and non-redundant result for the analysis of frequent pattern. Most of the previous studies adopted the FP-tree based conditional FP-tree generation and candidate itemsets generation-and-test approaches. ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGMOD Record

ACM SIGMOD Record Volume 29, Issue 2

June 2000

609 pages

ISSN:0163-5808

DOI:10.1145/335191

Editors:
Weidong Chen
Southern Methodist Univ., Dallas, TX
,
Jeffrey Naughton
Univ. of Wisconsin-Madison, Madison
,
Philip A. Bernstein
Microsoft

Issue’s Table of Contents

SIGMOD '00: Proceedings of the 2000 ACM SIGMOD international conference on Management of data
May 2000
604 pages
ISBN:1581132174
DOI:10.1145/342009
Chairmen:
Maggie Dunham
Southern Methodist Univ.
,
Jeffrey F. Naughton
Univ. of Wisconsin-Madison
,
Weidong Chen
Southern Methodist Univ.
,
Nick Koudas
AT &T Labs

Copyright © 2000 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 May 2000

Published in SIGMOD Volume 29, Issue 2

Check for updates

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4,984
Total Citations
View Citations
8,497
Total Downloads

Downloads (Last 12 months)2,576
Downloads (Last 6 weeks)355

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Lee KKeikhosrokiani PWong JAsl M(2024)Narrative Threads and Cinematic Connections Using Intelligent Systems to Enhance Movie Recommendations with Market Basket Analysis and Advanced AlgorithmsData-Driven Business Intelligence Systems for Socio-Technical Organizations10.4018/979-8-3693-1210-0.ch013(319-364)Online publication date: 23-Feb-2024
https://doi.org/10.4018/979-8-3693-1210-0.ch013
Alwhbi IZou CAlharbi R(2024)Encrypted Network Traffic Analysis and Classification Utilizing Machine LearningSensors10.3390/s2411350924:11(3509)Online publication date: 29-May-2024
https://doi.org/10.3390/s24113509
Budaraju RJammalamadaka S(2024)Mining Negative Associations from Medical Databases Considering Frequent, Regular, Closed and Maximal PatternsComputers10.3390/computers1301001813:1(18)Online publication date: 8-Jan-2024
https://doi.org/10.3390/computers13010018
Gutiérrez-Soto CGaldames PPalomino M(2024)An Efficient Probabilistic Algorithm to Detect Periodic Patterns in Spatio-Temporal DatasetsBig Data and Cognitive Computing10.3390/bdcc80600598:6(59)Online publication date: 3-Jun-2024
https://doi.org/10.3390/bdcc8060059
Chen YChen B(2024)On Mining Most Popular PackagesAdvances in Science, Technology and Engineering Systems Journal10.25046/aj0904079:4(60-72)Online publication date: Aug-2024
https://doi.org/10.25046/aj090407
Han HWang JChen SYan M(2024)Product Quality Prediction Based on RBF Optimized by Firefly AlgorithmJournal of Systems Engineering and Electronics10.23919/JSEE.2023.00006135:1(105-117)Online publication date: Feb-2024
https://doi.org/10.23919/JSEE.2023.000061
Hsiao WWang H(2024)Cross-domain corpus selection for cold-start contextJournal of Information Science10.1177/01655515241263283Online publication date: 25-Jul-2024
https://doi.org/10.1177/01655515241263283
Singh KBiswas B(2024)Mining top-k high on-shelf utility itemsets using novel threshold raising strategiesACM Transactions on Knowledge Discovery from Data10.1145/3645115Online publication date: 8-Feb-2024
https://doi.org/10.1145/3645115
Rodríguez-González AAranda RÁlvarez-Carmona MDíaz-Pacheco ARosas R(2024)X-FSPMiner: A Novel Algorithm for Frequent Similar Pattern MiningACM Transactions on Knowledge Discovery from Data10.1145/364382018:5(1-26)Online publication date: 30-Jan-2024
https://dl.acm.org/doi/10.1145/3643820
Yang PWang LZhou LChen H(2024)Mining Spatial Co-Location Patterns With a Mixed Prevalence MeasureIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.322111235:6(7845-7859)Online publication date: Jun-2024
https://doi.org/10.1109/TNNLS.2022.3221112
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents