research-article

Finding Interesting Associations without Support Pruning

Authors:

Shinji Fujiwara,

Aristides Gionis,

Rajeev Motwani,

Jeffrey D. Ullman,

Cheng YangAuthors Info & Claims

IEEE Transactions on Knowledge and Data Engineering, Volume 13, Issue 1

Pages 64 - 78

https://doi.org/10.1109/69.908981

Published: 01 January 2001 Publication History

Abstract

Association-rule mining has heretofore relied on the condition of high support to do its work efficiently. In particular, the well-known a priori algorithm is only effective when the only rules of interest are relationships that occur very frequently. However, there are a number of applications, such as data mining, identification of similar web documents, clustering, and collaborative filtering, where the rules of interest have comparatively few instances in the data. In these cases, we must look for highly correlated items, or possibly even causal relationships between infrequent items. We develop a family of algorithms for solving this problem, employing a combination of random sampling and hashing techniques. We provide analysis of the algorithms developed and conduct experiments on real and synthetic data to obtain a comparative performance analysis.

References

[1]

R. Agrawal T. Imielinski and A. Swami, “Mining Association Rules between Sets of Items in Large Databases,” Proc. ACM SIGMOD Conf. Management of Data, pp. 207–216, 1993.

Digital Library

[2]

R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. 20th Int'l Conf. Very Large Databases, 1994.

Digital Library

[3]

S. Brin R. Motwani J.D. Ullman and S. Tsur, “Dynamic Itemset Counting and Implication Rules for Market Basket Data,” Proc. ACM SIGMOD Conf. Management of Data, pp. 255–264, 1997.

Digital Library

[4]

A. Broder, “On the Resemblance and Containment of Documents,” Proc. Compression and Complexity of Sequences Conf. (SEQUENCES '97), pp. 21–29, 1998.

Digital Library

[5]

E. Cohen, “Size-Estimation Framework with Applications to Transitive Closure and Reachability,” J. Computer and System Sciences, vol. 55, pp. 441–453, 1997.

Digital Library

[6]

R.O. Duda and P.E. Hart, Pattern Classification and Scene Analysis. New York: Wiley InterScience, 1973.

[7]

A. Gionis P. Indyk and R. Motwani, “Similarity Search in High Dimensions via Hashing,” Proc. 25th Int'l Conf. Very Large Data Bases, pp. 518–529, 1999.

Digital Library

[8]

D. Goldberg D. Nichols B.M. Oki and D. Terry, “Using Collaborative Filtering to Weave an Information Tapestry,” Comm. ACM, vol. 55, pp. 1–19, 1991.

Digital Library

[9]

S. Guha R. Rastogi and K. Shim, “CURE—An Efficient Clustering Algorithm for Large Databases,” Proc. ACM-SIGMOD Int'l Conf. Management of Data, pp. 73–84, 1998.

Digital Library

[10]

J.M. Hellerstein P.J. Haas and H.J. Wang, “Online Aggregation,” Proc. ACM-SIGMOD Int'l Conf. Management of Data, 1997.

Digital Library

[11]

P. Indyk and R. Motwani, “Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality,” Proc. 30th Ann. ACM Symp. Theory of Computing, pp. 604–613, 1998.

Digital Library

[12]

R. Motwani and P. Raghavan, Randomized Algorithms. Cambridge University Press, 1995.

Digital Library

[13]

N. Shivakumar and H. Garcia-Molina, “Building a Scalable and Accurate Copy Detection Mechanism,” Proc. Third Int'l Conf. Theory and Practice of Digital Libraries, 1996.

Digital Library

[14]

C. Silverstein S. Brin and R. Motwani, “Beyond Market Baskets: Generalizing Association Rules to Dependence Rules,” Preliminary version: In Proc. ACM SIGMOD Conf. Management of Data, pp. 265–276, 1997, Journal version: Data Mining and Knowledge Discovery, vol. 2, 69–96, 1998.

Digital Library

[15]

C. Silverstein S. Brin R. Motwani and J. D. Ullman, “Scalable Techniques for Mining Causal Structures,” Proc. 24th Int'l Conf. Very Large Data Bases, pp. 594–605, 1998.

Digital Library

[16]

H.R. Varian and P. Resnick, eds., CACM Special Issue on Recommender Systems, Comm. ACM, vol. 40, 1997.

Digital Library

Cited By

Metwally AShum MBarcelo PSanchez-Pi NMeliou ASudarshan S(2024)Similarity Joins of Sparse FeaturesCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653370(80-92)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1145/3626246.3653370
Karjalainen MGalbrun EMiettinen P(2024)Fast Redescription Mining Using Locality-Sensitive HashingMachine Learning and Knowledge Discovery in Databases. Research Track10.1007/978-3-031-70368-3_8(124-142)Online publication date: 8-Sep-2024
https://dl.acm.org/doi/10.1007/978-3-031-70368-3_8
Liu KWang SZhang YXing C(2023)An Efficient Algorithm for Distance-based Structural Graph ClusteringProceedings of the ACM on Management of Data10.1145/35887251:1(1-25)Online publication date: 30-May-2023
https://dl.acm.org/doi/10.1145/3588725
Show More Cited By

Index Terms

Finding Interesting Associations without Support Pruning
1. Information systems

Recommendations

Mining N-most interesting itemsets without support threshold by the COFI-tree

Data mining is the discovery of interesting and hidden patterns from a large amount of collected data. Applications can be found in many organisations with large databases, for many different purposes such as customer relationships, marketing, planning, ...
An efficient approach to mining indirect associations

Discovering association rules is one of the important tasks in data mining. While most of the existing algorithms are developed for efficient mining of frequent patterns, it has been noted recently that some of the infrequent patterns, such as indirect ...
Mining fuzzy specific rare itemsets for education data

Association rule mining is an important data analysis method for the discovery of associations within data. There have been many studies focused on finding fuzzy association rules from transaction databases. Unfortunately, in the real world, one may ...

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Knowledge and Data Engineering

IEEE Transactions on Knowledge and Data Engineering Volume 13, Issue 1

January 2001

144 pages

ISSN:1041-4347

Issue’s Table of Contents

Copyright © Copyright © 2001 IEEE. All Rights Reserved.

Publisher

IEEE Educational Activities Department

United States

Publication History

Published: 01 January 2001

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

107
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Metwally AShum MBarcelo PSanchez-Pi NMeliou ASudarshan S(2024)Similarity Joins of Sparse FeaturesCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653370(80-92)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1145/3626246.3653370
Karjalainen MGalbrun EMiettinen P(2024)Fast Redescription Mining Using Locality-Sensitive HashingMachine Learning and Knowledge Discovery in Databases. Research Track10.1007/978-3-031-70368-3_8(124-142)Online publication date: 8-Sep-2024
https://dl.acm.org/doi/10.1007/978-3-031-70368-3_8
Liu KWang SZhang YXing C(2023)An Efficient Algorithm for Distance-based Structural Graph ClusteringProceedings of the ACM on Management of Data10.1145/35887251:1(1-25)Online publication date: 30-May-2023
https://dl.acm.org/doi/10.1145/3588725
Shayegan MFaizollahi-Samarin M(2022)An extended version of sectional MinHash method for near-duplicate detectionThe Journal of Supercomputing10.1007/s11227-022-04447-x78:13(15638-15662)Online publication date: 1-Sep-2022
https://dl.acm.org/doi/10.1007/s11227-022-04447-x
Ertl O(2021)SetSketchProceedings of the VLDB Endowment10.14778/3476249.347627614:11(2244-2257)Online publication date: 27-Oct-2021
https://dl.acm.org/doi/10.14778/3476249.3476276
Bury MGentili MSchwiegelshohn CSorella M(2021)Polynomial Time Approximation Schemes for All 1-Center Problems on Metric Rational Set SimilaritiesAlgorithmica10.1007/s00453-020-00787-383:5(1371-1392)Online publication date: 1-May-2021
https://dl.acm.org/doi/10.1007/s00453-020-00787-3
Chuang YLi F(2020)TCR: a trustworthy and churn-resilient academic distribution and retrieval system in P2P networksThe Journal of Supercomputing10.1007/s11227-020-03146-976:9(7107-7139)Online publication date: 1-Sep-2020
https://dl.acm.org/doi/10.1007/s11227-020-03146-9
Chenli CJung T(2020)ProvNet: Networked Blockchain for Decentralized Secure ProvenanceBlockchain – ICBC 202010.1007/978-3-030-59638-5_6(76-93)Online publication date: 18-Sep-2020
https://dl.acm.org/doi/10.1007/978-3-030-59638-5_6
Li LZheng SCai JLi J(2020)Efficient Source Selection for Error Detection via Matching DependenciesDatabase Systems for Advanced Applications10.1007/978-3-030-59410-7_13(211-227)Online publication date: 24-Sep-2020
https://dl.acm.org/doi/10.1007/978-3-030-59410-7_13
Benito-Picazo FCordero PEnciso MMora A(2019)Minimal generators, an affordable approach by means of massive computationThe Journal of Supercomputing10.1007/s11227-018-2453-z75:3(1350-1367)Online publication date: 1-Mar-2019
https://dl.acm.org/doi/10.1007/s11227-018-2453-z
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents