Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1183614.1183680acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

TRIPS and TIDES: new algorithms for tree mining

Published: 06 November 2006 Publication History
  • Get Citation Alerts
  • Abstract

    Recent research in data mining has progressed from mining frequent itemsets to more general and structured patterns like trees and graphs. In this paper, we address the problem of frequent subtree mining that has proven to be viable in a wide range of applications such as bioinformatics, XML processing, computational linguistics, and web usage mining. We propose novel algorithms to mine frequent subtrees from a database of rooted trees. We evaluate the use of two popular sequential encodings of trees to systematically generate and evaluate the candidate patterns. The proposed approach is very generic and can be used to mine embedded or induced subtrees that can be labeled, unlabeled, ordered, unordered, or edge-labeled. Our algorithms are highly cache-conscious in nature because of the compact and simple array-based data structures we use. Typically, L1 and L2 hit rates above 99% are observed. Experimental evaluation showed that our algorithms can achieve up to several orders of magnitude speedup on real datasets when compared to state-of-the-art tree mining algorithms.

    References

    [1]
    T. Asai, K. Abe, S. Kawasoe, H. Arimura, H. Sakamoto, and S. Arikawa. Efficient substructure discovery from large semi-structured data. Proceedings the 2nd SIAM International Conference on Data Mining (SDM2002), pages 158--174, 2002.]]
    [2]
    S. Basagni, I. Chlamtac, et al. Location aware, dependable multicast for mobile ad hoc networks. Computer Networks, 36(5):659--670, 2001.]]
    [3]
    Y. Chi, S. Nijssen, R. Muntz, and J. Kok. Frequent Subtree Mining-An Overview. Fundamenta Informaticae, 2005.]]
    [4]
    Y. Chi, Y. Yang, Y. Xia, and R. R. Muntz. CMTreeMiner: Mining Both Closed and Maximal Frequent Subtrees. The Eighth Pacic Asia Conference on Knowledge Discovery and Data Mining (PAKDD04), 2004.]]
    [5]
    R. Cooley, B. Mobasher, and J. Srivastava. Web Mining: Information and Pattern Discovery on the World Wide Web. Proceedings of the 9th IEEE International Conference on Tools with Articial Intelligence (ICTAI97), 1(2.1), 1997.]]
    [6]
    J. H. R. Cui, J. R. Kim, D. R. Maggiorini, K. R. Boussetta, and M. R. Gerla. Aggregated Multicast - A Comparative Study. Cluster Computing, 8(1):15--26, 2005.]]
    [7]
    A. Ghoting, G. Buehrer, and S. Parthasarathy et al. Cache conscious frequent pattern mining on a modern processor. In Proceedings of the 31st international conference on very large databases (VLDB), 2005.]]
    [8]
    J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. ACM SIGMOD Record, 2000.]]
    [9]
    M. Kuramochi and G. Karypis. Frequent subgraph discovery. Proceedings IEEE International Conference on Data Mining, ICDM 2001., pages 313--320, 2001.]]
    [10]
    S. Nijssen and J. N. Kok. Efficient discovery of frequent unordered trees. First International Workshop on Mining Graphs, Trees and Sequences, pages 55--64, 2003.]]
    [11]
    S. Nijssen and J. N. Kok. A quickstart in frequent structure mining can make a difference. Proceedings of the 2004 ACMSIGKDD international conference on Knowledge discovery and data mining, pages 647--652, 2004.]]
    [12]
    J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M.C. Hsu. PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth. Proceedings. 17th International Conference on Data Engineering, 2001.]]
    [13]
    H. Prüfer. Neuer Beweis eines Satzes über Permutationen. Archiv für Mathematik und Physik, 27:742--744, 1918.]]
    [14]
    P. Rao and B. Moon. PRIX: indexing and querying XML using prufer sequences. Data Engineering, 2004. Proceedings. 20th International Conference on, pages 288--299, 2004.]]
    [15]
    U. Ruckert and S. Kramer. Frequent free tree discovery in graph data. Proceedings of the 2004 ACM symposium on Applied computing, pages 564--570, 2004.]]
    [16]
    H. Tan, T. S. Dillon, L. Feng, E. Chang, and F. Hadzic. X3-Miner: Mining Patterns from XML Database. Proceedings Data Mining. Skiathos, Greece, 2005.]]
    [17]
    H. Tan, T. S. Dillon, F. Hadzic, E. Chang, and L. Feng. IMB3-Miner: Mining Induced/Embedded Subtrees by Constraining the Level of Embedding. The Eighth Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2006.]]
    [18]
    S. Tatikonda, S. Parthasarathy, and T. Kurc. Trips and tides: New algorithms for tree mining. Technical Report ftp://ftp.cse.ohio-state.edu/pub/tech-report/2006/TR68.pdf, (OSU-CISRC-7/06-TR68), 2006.]]
    [19]
    A. Termier, M. C. Rousset, and M. Sebag. DRYADE: A New Approach for Discovering Closed Frequent Trees in Heterogeneous Tree Databases. Proceedings of Fourth IEEE International Conference on Data Mining, 2004.]]
    [20]
    A. Termier, M. C. Rousset, M. Sebag, K. Ohara, T. Washio, and H. Motoda. Efficient mining of high branching factor attribute trees. Proceedings of Fifth IEEE International Conference on Data Mining, 2005, pages 785--788, 2005.]]
    [21]
    C. Wang, M. Hong, J. Pei, H. Zhou, W. Wang, and B. Shi. Efficient Pattern-Growth Methods for Frequent Tree Pattern Mining. The Eighth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD04), 2004.]]
    [22]
    C. Wang, S. Parthasarathy, and R. Jin. A Decomposition-Based Probabilistic Framework for Estimating the Selectivity of XML Twig Queries. International Conference on Extending Database Technology, 2006.]]
    [23]
    K. Wang and H. Liu. Discovering typical structures of documents: a road map approach. Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, 1998.]]
    [24]
    X. Yan and J. Han. gSpan: graph-based substructure pattern mining. Proceedings of IEEE International Conference on Data Mining (ICDM), pages 721--724, 2002.]]
    [25]
    M. J. Zaki. Efficiently mining frequent trees in a forest. Proceedings of the eighth ACM SIGKDD conference on Knowledge discovery and data mining, 2002.]]
    [26]
    M. J. Zaki and C. C. Aggarwal. XRules: an effective structural classier for XML data. Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 316--325, 2003.]]
    [27]
    M. J. Zaki, S. Parthasarathy, M. Ogihara, W. Li, et al. New algorithms for fast discovery of association rules. 3rd Intl.Conf. on Knowledge Discovery and Data Mining, pages 283--296, 1997.]]
    [28]
    S. Zhang and J. T. L. Wang. Mining Frequent Agreement Subtrees in Phylogenetic Databases. Proceedings of the 6th SIAM International Conference on Data Mining (SDM2006), pages 222--233, 2006.]]

    Cited By

    View all
    • (2023)Mining Frequent Infix Patterns from Concurrency-Aware Process Execution VariantsProceedings of the VLDB Endowment10.14778/3603581.360360316:10(2666-2678)Online publication date: 1-Jun-2023
    • (2020)Reversible Circuit Synthesis Time Reduction Based on Subtree-Circuit MappingApplied Sciences10.3390/app1012414710:12(4147)Online publication date: 16-Jun-2020
    • (2019)Short and Long-term Pattern Discovery Over Large-Scale Geo-Spatiotemporal DataProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3292500.3330755(2905-2913)Online publication date: 25-Jul-2019
    • Show More Cited By

    Index Terms

    1. TRIPS and TIDES: new algorithms for tree mining

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge management
      November 2006
      916 pages
      ISBN:1595934332
      DOI:10.1145/1183614
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 06 November 2006

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Prufer sequences
      2. depth first order codes
      3. embedding lists
      4. frequent patterns
      5. tree mining

      Qualifiers

      • Article

      Conference

      CIKM06
      CIKM06: Conference on Information and Knowledge Management
      November 6 - 11, 2006
      Virginia, Arlington, USA

      Acceptance Rates

      Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

      Upcoming Conference

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 10 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Mining Frequent Infix Patterns from Concurrency-Aware Process Execution VariantsProceedings of the VLDB Endowment10.14778/3603581.360360316:10(2666-2678)Online publication date: 1-Jun-2023
      • (2020)Reversible Circuit Synthesis Time Reduction Based on Subtree-Circuit MappingApplied Sciences10.3390/app1012414710:12(4147)Online publication date: 16-Jun-2020
      • (2019)Short and Long-term Pattern Discovery Over Large-Scale Geo-Spatiotemporal DataProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3292500.3330755(2905-2913)Online publication date: 25-Jul-2019
      • (2017)Frequent subtree mining on the automata processorProceedings of the International Conference on Supercomputing10.1145/3079079.3079084(1-11)Online publication date: 14-Jun-2017
      • (2017)Homomorphic Pattern Mining from a Single Large Data TreeData Science and Engineering10.1007/s41019-016-0028-71:4(203-218)Online publication date: 10-Jan-2017
      • (2017)Efficiently Discovering Most-Specific Mixed Patterns from Large Data TreesDatabase Systems for Advanced Applications10.1007/978-3-319-55753-3_18(279-294)Online publication date: 22-Mar-2017
      • (2016)Mining rooted ordered trees under subtree homeomorphismData Mining and Knowledge Discovery10.1007/s10618-015-0439-530:5(1249-1272)Online publication date: 1-Sep-2016
      • (2016)Transactional Tree MiningEuropean Conference on Machine Learning and Knowledge Discovery in Databases - Volume 985110.1007/978-3-319-46128-1_12(182-198)Online publication date: 19-Sep-2016
      • (2015)Ordered subtree mining via transactional mapping using a structure-preserving tree database schemaInformation Sciences: an International Journal10.1016/j.ins.2015.03.015310:C(97-117)Online publication date: 20-Jul-2015
      • (2015)Leveraging Homomorphisms and Bitmaps to Enable the Mining of Embedded Patterns from Large Data TreesDatabase Systems for Advanced Applications10.1007/978-3-319-18120-2_1(3-20)Online publication date: 9-Apr-2015
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media