Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Finding Interesting Associations without Support Pruning

Published: 01 January 2001 Publication History

Abstract

Association-rule mining has heretofore relied on the condition of high support to do its work efficiently. In particular, the well-known a priori algorithm is only effective when the only rules of interest are relationships that occur very frequently. However, there are a number of applications, such as data mining, identification of similar web documents, clustering, and collaborative filtering, where the rules of interest have comparatively few instances in the data. In these cases, we must look for highly correlated items, or possibly even causal relationships between infrequent items. We develop a family of algorithms for solving this problem, employing a combination of random sampling and hashing techniques. We provide analysis of the algorithms developed and conduct experiments on real and synthetic data to obtain a comparative performance analysis.

References

[1]
R. Agrawal T. Imielinski and A. Swami, “Mining Association Rules between Sets of Items in Large Databases,” Proc. ACM SIGMOD Conf. Management of Data, pp. 207–216, 1993.
[2]
R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. 20th Int'l Conf. Very Large Databases, 1994.
[3]
S. Brin R. Motwani J.D. Ullman and S. Tsur, “Dynamic Itemset Counting and Implication Rules for Market Basket Data,” Proc. ACM SIGMOD Conf. Management of Data, pp. 255–264, 1997.
[4]
A. Broder, “On the Resemblance and Containment of Documents,” Proc. Compression and Complexity of Sequences Conf. (SEQUENCES '97), pp. 21–29, 1998.
[5]
E. Cohen, “Size-Estimation Framework with Applications to Transitive Closure and Reachability,” J. Computer and System Sciences, vol. 55, pp. 441–453, 1997.
[6]
R.O. Duda and P.E. Hart, Pattern Classification and Scene Analysis. New York: Wiley InterScience, 1973.
[7]
A. Gionis P. Indyk and R. Motwani, “Similarity Search in High Dimensions via Hashing,” Proc. 25th Int'l Conf. Very Large Data Bases, pp. 518–529, 1999.
[8]
D. Goldberg D. Nichols B.M. Oki and D. Terry, “Using Collaborative Filtering to Weave an Information Tapestry,” Comm. ACM, vol. 55, pp. 1–19, 1991.
[9]
S. Guha R. Rastogi and K. Shim, “CURE—An Efficient Clustering Algorithm for Large Databases,” Proc. ACM-SIGMOD Int'l Conf. Management of Data, pp. 73–84, 1998.
[10]
J.M. Hellerstein P.J. Haas and H.J. Wang, “Online Aggregation,” Proc. ACM-SIGMOD Int'l Conf. Management of Data, 1997.
[11]
P. Indyk and R. Motwani, “Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality,” Proc. 30th Ann. ACM Symp. Theory of Computing, pp. 604–613, 1998.
[12]
R. Motwani and P. Raghavan, Randomized Algorithms. Cambridge University Press, 1995.
[13]
N. Shivakumar and H. Garcia-Molina, “Building a Scalable and Accurate Copy Detection Mechanism,” Proc. Third Int'l Conf. Theory and Practice of Digital Libraries, 1996.
[14]
C. Silverstein S. Brin and R. Motwani, “Beyond Market Baskets: Generalizing Association Rules to Dependence Rules,” Preliminary version: In Proc. ACM SIGMOD Conf. Management of Data, pp. 265–276, 1997, Journal version: Data Mining and Knowledge Discovery, vol. 2, 69–96, 1998.
[15]
C. Silverstein S. Brin R. Motwani and J. D. Ullman, “Scalable Techniques for Mining Causal Structures,” Proc. 24th Int'l Conf. Very Large Data Bases, pp. 594–605, 1998.
[16]
H.R. Varian and P. Resnick, eds., CACM Special Issue on Recommender Systems, Comm. ACM, vol. 40, 1997.

Cited By

View all
  • (2024)Similarity Joins of Sparse FeaturesCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653370(80-92)Online publication date: 9-Jun-2024
  • (2024)Fast Redescription Mining Using Locality-Sensitive HashingMachine Learning and Knowledge Discovery in Databases. Research Track10.1007/978-3-031-70368-3_8(124-142)Online publication date: 8-Sep-2024
  • (2023)An Efficient Algorithm for Distance-based Structural Graph ClusteringProceedings of the ACM on Management of Data10.1145/35887251:1(1-25)Online publication date: 30-May-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Knowledge and Data Engineering
IEEE Transactions on Knowledge and Data Engineering  Volume 13, Issue 1
January 2001
144 pages

Publisher

IEEE Educational Activities Department

United States

Publication History

Published: 01 January 2001

Author Tags

  1. Data mining
  2. association rules
  3. locality sensitive hashing.
  4. min hashing
  5. similarity metric

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Similarity Joins of Sparse FeaturesCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653370(80-92)Online publication date: 9-Jun-2024
  • (2024)Fast Redescription Mining Using Locality-Sensitive HashingMachine Learning and Knowledge Discovery in Databases. Research Track10.1007/978-3-031-70368-3_8(124-142)Online publication date: 8-Sep-2024
  • (2023)An Efficient Algorithm for Distance-based Structural Graph ClusteringProceedings of the ACM on Management of Data10.1145/35887251:1(1-25)Online publication date: 30-May-2023
  • (2022)An extended version of sectional MinHash method for near-duplicate detectionThe Journal of Supercomputing10.1007/s11227-022-04447-x78:13(15638-15662)Online publication date: 1-Sep-2022
  • (2021)SetSketchProceedings of the VLDB Endowment10.14778/3476249.347627614:11(2244-2257)Online publication date: 27-Oct-2021
  • (2021)Polynomial Time Approximation Schemes for All 1-Center Problems on Metric Rational Set SimilaritiesAlgorithmica10.1007/s00453-020-00787-383:5(1371-1392)Online publication date: 1-May-2021
  • (2020)TCR: a trustworthy and churn-resilient academic distribution and retrieval system in P2P networksThe Journal of Supercomputing10.1007/s11227-020-03146-976:9(7107-7139)Online publication date: 1-Sep-2020
  • (2020)ProvNet: Networked Blockchain for Decentralized Secure ProvenanceBlockchain – ICBC 202010.1007/978-3-030-59638-5_6(76-93)Online publication date: 18-Sep-2020
  • (2020)Efficient Source Selection for Error Detection via Matching DependenciesDatabase Systems for Advanced Applications10.1007/978-3-030-59410-7_13(211-227)Online publication date: 24-Sep-2020
  • (2019)Minimal generators, an affordable approach by means of massive computationThe Journal of Supercomputing10.1007/s11227-018-2453-z75:3(1350-1367)Online publication date: 1-Mar-2019
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media