Article

TRIPS and TIDES: new algorithms for tree mining

Authors:

Shirish Tatikonda,

Srinivasan Parthasarathy,

Tahsin KurcAuthors Info & Claims

CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge management

Pages 455 - 464

https://doi.org/10.1145/1183614.1183680

Published: 06 November 2006 Publication History

Abstract

Recent research in data mining has progressed from mining frequent itemsets to more general and structured patterns like trees and graphs. In this paper, we address the problem of frequent subtree mining that has proven to be viable in a wide range of applications such as bioinformatics, XML processing, computational linguistics, and web usage mining. We propose novel algorithms to mine frequent subtrees from a database of rooted trees. We evaluate the use of two popular sequential encodings of trees to systematically generate and evaluate the candidate patterns. The proposed approach is very generic and can be used to mine embedded or induced subtrees that can be labeled, unlabeled, ordered, unordered, or edge-labeled. Our algorithms are highly cache-conscious in nature because of the compact and simple array-based data structures we use. Typically, L1 and L2 hit rates above 99% are observed. Experimental evaluation showed that our algorithms can achieve up to several orders of magnitude speedup on real datasets when compared to state-of-the-art tree mining algorithms.

References

[1]

T. Asai, K. Abe, S. Kawasoe, H. Arimura, H. Sakamoto, and S. Arikawa. Efficient substructure discovery from large semi-structured data. Proceedings the 2nd SIAM International Conference on Data Mining (SDM2002), pages 158--174, 2002.]]

[2]

S. Basagni, I. Chlamtac, et al. Location aware, dependable multicast for mobile ad hoc networks. Computer Networks, 36(5):659--670, 2001.]]

[3]

Y. Chi, S. Nijssen, R. Muntz, and J. Kok. Frequent Subtree Mining-An Overview. Fundamenta Informaticae, 2005.]]

Digital Library

[4]

Y. Chi, Y. Yang, Y. Xia, and R. R. Muntz. CMTreeMiner: Mining Both Closed and Maximal Frequent Subtrees. The Eighth Pacic Asia Conference on Knowledge Discovery and Data Mining (PAKDD04), 2004.]]

[5]

R. Cooley, B. Mobasher, and J. Srivastava. Web Mining: Information and Pattern Discovery on the World Wide Web. Proceedings of the 9th IEEE International Conference on Tools with Articial Intelligence (ICTAI97), 1(2.1), 1997.]]

Digital Library

[6]

J. H. R. Cui, J. R. Kim, D. R. Maggiorini, K. R. Boussetta, and M. R. Gerla. Aggregated Multicast - A Comparative Study. Cluster Computing, 8(1):15--26, 2005.]]

Digital Library

[7]

A. Ghoting, G. Buehrer, and S. Parthasarathy et al. Cache conscious frequent pattern mining on a modern processor. In Proceedings of the 31st international conference on very large databases (VLDB), 2005.]]

Digital Library

[8]

J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. ACM SIGMOD Record, 2000.]]

Digital Library

[9]

M. Kuramochi and G. Karypis. Frequent subgraph discovery. Proceedings IEEE International Conference on Data Mining, ICDM 2001., pages 313--320, 2001.]]

Digital Library

[10]

S. Nijssen and J. N. Kok. Efficient discovery of frequent unordered trees. First International Workshop on Mining Graphs, Trees and Sequences, pages 55--64, 2003.]]

[11]

S. Nijssen and J. N. Kok. A quickstart in frequent structure mining can make a difference. Proceedings of the 2004 ACMSIGKDD international conference on Knowledge discovery and data mining, pages 647--652, 2004.]]

Digital Library

[12]

J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M.C. Hsu. PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth. Proceedings. 17th International Conference on Data Engineering, 2001.]]

Digital Library

[13]

H. Prüfer. Neuer Beweis eines Satzes über Permutationen. Archiv für Mathematik und Physik, 27:742--744, 1918.]]

[14]

P. Rao and B. Moon. PRIX: indexing and querying XML using prufer sequences. Data Engineering, 2004. Proceedings. 20th International Conference on, pages 288--299, 2004.]]

Digital Library

[15]

U. Ruckert and S. Kramer. Frequent free tree discovery in graph data. Proceedings of the 2004 ACM symposium on Applied computing, pages 564--570, 2004.]]

Digital Library

[16]

H. Tan, T. S. Dillon, L. Feng, E. Chang, and F. Hadzic. X3-Miner: Mining Patterns from XML Database. Proceedings Data Mining. Skiathos, Greece, 2005.]]

[17]

H. Tan, T. S. Dillon, F. Hadzic, E. Chang, and L. Feng. IMB3-Miner: Mining Induced/Embedded Subtrees by Constraining the Level of Embedding. The Eighth Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2006.]]

Digital Library

[18]

S. Tatikonda, S. Parthasarathy, and T. Kurc. Trips and tides: New algorithms for tree mining. Technical Report ftp://ftp.cse.ohio-state.edu/pub/tech-report/2006/TR68.pdf, (OSU-CISRC-7/06-TR68), 2006.]]

Digital Library

[19]

A. Termier, M. C. Rousset, and M. Sebag. DRYADE: A New Approach for Discovering Closed Frequent Trees in Heterogeneous Tree Databases. Proceedings of Fourth IEEE International Conference on Data Mining, 2004.]]

Digital Library

[20]

A. Termier, M. C. Rousset, M. Sebag, K. Ohara, T. Washio, and H. Motoda. Efficient mining of high branching factor attribute trees. Proceedings of Fifth IEEE International Conference on Data Mining, 2005, pages 785--788, 2005.]]

Digital Library

[21]

C. Wang, M. Hong, J. Pei, H. Zhou, W. Wang, and B. Shi. Efficient Pattern-Growth Methods for Frequent Tree Pattern Mining. The Eighth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD04), 2004.]]

[22]

C. Wang, S. Parthasarathy, and R. Jin. A Decomposition-Based Probabilistic Framework for Estimating the Selectivity of XML Twig Queries. International Conference on Extending Database Technology, 2006.]]

Digital Library

[23]

K. Wang and H. Liu. Discovering typical structures of documents: a road map approach. Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, 1998.]]

Digital Library

[24]

X. Yan and J. Han. gSpan: graph-based substructure pattern mining. Proceedings of IEEE International Conference on Data Mining (ICDM), pages 721--724, 2002.]]

Digital Library

[25]

M. J. Zaki. Efficiently mining frequent trees in a forest. Proceedings of the eighth ACM SIGKDD conference on Knowledge discovery and data mining, 2002.]]

Digital Library

[26]

M. J. Zaki and C. C. Aggarwal. XRules: an effective structural classier for XML data. Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 316--325, 2003.]]

Digital Library

[27]

M. J. Zaki, S. Parthasarathy, M. Ogihara, W. Li, et al. New algorithms for fast discovery of association rules. 3rd Intl.Conf. on Knowledge Discovery and Data Mining, pages 283--296, 1997.]]

[28]

S. Zhang and J. T. L. Wang. Mining Frequent Agreement Subtrees in Phylogenetic Databases. Proceedings of the 6th SIAM International Conference on Data Mining (SDM2006), pages 222--233, 2006.]]

Cited By

Martini MSchuster Dvan der Aalst W(2023)Mining Frequent Infix Patterns from Concurrency-Aware Process Execution VariantsProceedings of the VLDB Endowment10.14778/3603581.360360316:10(2666-2678)Online publication date: 1-Jun-2023
https://dl.acm.org/doi/10.14778/3603581.3603603
Hawash AAwad AAbdalhaq B(2020)Reversible Circuit Synthesis Time Reduction Based on Subtree-Circuit MappingApplied Sciences10.3390/app1012414710:12(4147)Online publication date: 16-Jun-2020
https://doi.org/10.3390/app10124147
Moosavi SSamavatian MNandi AParthasarathy SRamnath RTeredesai AKumar VLi YRosales RTerzi EKarypis G(2019)Short and Long-term Pattern Discovery Over Large-Scale Geo-Spatiotemporal DataProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3292500.3330755(2905-2913)Online publication date: 25-Jul-2019
https://dl.acm.org/doi/10.1145/3292500.3330755
Show More Cited By

Index Terms

TRIPS and TIDES: new algorithms for tree mining
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Bottom-up discovery of frequent rooted unordered subtrees

In the past decade, XML has emerged as the standard language for information exchanging over the Internet. Due to its tree-structure paradigm, XML is superior for its capability of storing, querying, and manipulating complex data. Therefore, discovering ...
Mining of closed frequent subtrees from frequently updated databases

We study the problem of mining closed frequent subtrees from tree databases that are updated regularly over time. Closed frequent subtrees provide condensed and complete information for all frequent subtrees in the database. Although mining closed ...
Frequent pattern mining in attributed trees: algorithms and applications

Frequent pattern mining is an important data mining task with a broad range of applications. Initially focused on the discovery of frequent itemsets, studies were extended to mine structural forms like sequences, trees or graphs. In this paper, we ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge management

November 2006

916 pages

ISBN:1595934332

DOI:10.1145/1183614

General Chair:
Philip S. Yu
IBM T.J. Watson Research Center (USA)
,
Program Chairs:
Vassilis Tsotras
University of California-Riverside (USA)
,
Edward Fox
Virginia Tech (USA)
,
Bing Liu
University of Illinois at Chicago (USA)

Copyright © 2006 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 November 2006

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

CIKM06

Sponsor:

CIKM06: Conference on Information and Knowledge Management

November 6 - 11, 2006

Virginia, Arlington, USA

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

27
Total Citations
View Citations
690
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Martini MSchuster Dvan der Aalst W(2023)Mining Frequent Infix Patterns from Concurrency-Aware Process Execution VariantsProceedings of the VLDB Endowment10.14778/3603581.360360316:10(2666-2678)Online publication date: 1-Jun-2023
https://dl.acm.org/doi/10.14778/3603581.3603603
Hawash AAwad AAbdalhaq B(2020)Reversible Circuit Synthesis Time Reduction Based on Subtree-Circuit MappingApplied Sciences10.3390/app1012414710:12(4147)Online publication date: 16-Jun-2020
https://doi.org/10.3390/app10124147
Moosavi SSamavatian MNandi AParthasarathy SRamnath RTeredesai AKumar VLi YRosales RTerzi EKarypis G(2019)Short and Long-term Pattern Discovery Over Large-Scale Geo-Spatiotemporal DataProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3292500.3330755(2905-2913)Online publication date: 25-Jul-2019
https://dl.acm.org/doi/10.1145/3292500.3330755
Sadredini ERahimi RWang KSkadron KGropp WBeckman PLi ZCazorla F(2017)Frequent subtree mining on the automata processorProceedings of the International Conference on Supercomputing10.1145/3079079.3079084(1-11)Online publication date: 14-Jun-2017
https://dl.acm.org/doi/10.1145/3079079.3079084
Wu XTheodoratos D(2017)Homomorphic Pattern Mining from a Single Large Data TreeData Science and Engineering10.1007/s41019-016-0028-71:4(203-218)Online publication date: 10-Jan-2017
https://doi.org/10.1007/s41019-016-0028-7
Wu XTheodoratos D(2017)Efficiently Discovering Most-Specific Mixed Patterns from Large Data TreesDatabase Systems for Advanced Applications10.1007/978-3-319-55753-3_18(279-294)Online publication date: 22-Mar-2017
https://doi.org/10.1007/978-3-319-55753-3_18
Haghir Chehreghani MBruynooghe M(2016)Mining rooted ordered trees under subtree homeomorphismData Mining and Knowledge Discovery10.1007/s10618-015-0439-530:5(1249-1272)Online publication date: 1-Sep-2016
https://dl.acm.org/doi/10.1007/s10618-015-0439-5
Haghir Chehreghani MHaghir Chehreghani M(2016)Transactional Tree MiningEuropean Conference on Machine Learning and Knowledge Discovery in Databases - Volume 985110.1007/978-3-319-46128-1_12(182-198)Online publication date: 19-Sep-2016
https://dl.acm.org/doi/10.1007/978-3-319-46128-1_12
Hadzic FHecker MTagarelli A(2015)Ordered subtree mining via transactional mapping using a structure-preserving tree database schemaInformation Sciences: an International Journal10.1016/j.ins.2015.03.015310:C(97-117)Online publication date: 20-Jul-2015
https://dl.acm.org/doi/10.1016/j.ins.2015.03.015
Wu XTheodoratos D(2015)Leveraging Homomorphisms and Bitmaps to Enable the Mining of Embedded Patterns from Large Data TreesDatabase Systems for Advanced Applications10.1007/978-3-319-18120-2_1(3-20)Online publication date: 9-Apr-2015
https://doi.org/10.1007/978-3-319-18120-2_1
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents