Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/789087.789726guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Summary Structures for Frequency Queries on Large Transaction Sets

Published: 28 March 2000 Publication History
  • Get Citation Alerts
  • Abstract

    As large-scale databases become commonplace, there has been significant interest in mining them for commercial purposes. One of the basic tasks that underlie many of these mining operations is querying of transaction sets for frequencies of specified attribute values. The size of these databases makes it important to develop summary structures capable of high compression ratios as well as supporting fast frequency queries. The nature of the problem and its differences with respect to traditional text compression allows very high compression ratios.In this paper, we propose a binary trie-based summary structure for representing transaction sets. We demonstrate that this trie structure, when augmented with an appropriate set of horizontal pointers, can support frequency queries several orders of magnitude faster than raw transaction data. We improve the memory characteristics of our scheme by compressing the trie into a Patricia trie and demonstrate that this does not have a significant adverse effect on frequency query time.We further reduce the size of this trie by selectively pruning branches to compute a dominant trie that is capable of approximate frequency querying. The complement trie called the deviant trie is also useful in many data mining applications. Recompressing the dominant trie into a Patricia trie results in further compression of the trie. Finally, we demonstrate that our binary compressed trie structure has better memory (compression) characteristics compared to related schemes. We support our claims with experimental results on datasets from the IBM synthetic association data generator.

    References

    [1]
    R. Agrawal, A. Arning, T. Bollinger, M. Mehta, J. Shafer, and R. Srikant. The quest data mining system. In Proceedings of the Second International Conference on Knowledge Discovery in Databases and Data Mining , Portland, Oregon, August 1996.
    [2]
    Rakesh Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In 20th International Conference on Very Large Data Bases (VLDB) , Santiago, Chile, June 1994.
    [3]
    Amihood Amir, Ronen Feldman, and Reuven Kashi. A new and versatile method for association generation. Information Systems , 22(6/7):333-347, 1999.
    [4]
    U.M. Fayyad et al. Advances in Knowledge Discovery and Data Mining . AAAI/MIT Press, Menlo Park, CA, 1996.
    [5]
    J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. Technical Report TR-99-12, Computing Science Technical Report, Simon Fraser University, October 1999.
    [6]
    Naren Ramakrishnan and Ananth Grama. Data mining: From serendipity to science. IEEE Computer Special Issue on Data Mining , August 1999.
    [7]
    W. Szpankowski. Patricia tries again revisited. Journal of the ACM , 37:691-711, 1990.
    [8]
    W. Szpankowski. A generalized suffix tree and its (un)expected asymptotic behaviors. SIAM J. Computing , 22:1176-1198, 1993.

    Cited By

    View all
    • (2017)Approximate Holistic Aggregation in Wireless Sensor NetworksACM Transactions on Sensor Networks10.1145/302748813:2(1-24)Online publication date: 19-Apr-2017
    • (2004)A Support-Ordered Trie for Fast Frequent Itemset DiscoveryIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2004.131856916:7(875-879)Online publication date: 1-Jul-2004
    • (2003)Beyond IndependenceIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2003.124528115:6(1409-1421)Online publication date: 1-Nov-2003

    Index Terms

    1. Summary Structures for Frequency Queries on Large Transaction Sets
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Guide Proceedings
      DCC '00: Proceedings of the Conference on Data Compression
      March 2000
      ISBN:0769505929

      Publisher

      IEEE Computer Society

      United States

      Publication History

      Published: 28 March 2000

      Author Tags

      1. Compressed Tries
      2. Compression
      3. Frequency Queries
      4. Frequent Sets
      5. Patricia Tries
      6. Transactions
      7. Tries

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 11 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2017)Approximate Holistic Aggregation in Wireless Sensor NetworksACM Transactions on Sensor Networks10.1145/302748813:2(1-24)Online publication date: 19-Apr-2017
      • (2004)A Support-Ordered Trie for Fast Frequent Itemset DiscoveryIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2004.131856916:7(875-879)Online publication date: 1-Jul-2004
      • (2003)Beyond IndependenceIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2003.124528115:6(1409-1421)Online publication date: 1-Nov-2003

      View Options

      View options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media