Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Frequent tree pattern mining: A survey

Published: 15 November 2010 Publication History
  • Get Citation Alerts
  • Abstract

    The use of non-linear data structures is becoming more and more common in many data mining scenarios. Trees, in particular, have drawn the attention of researchers as the simplest of non-linear data structures. Many tree mining algorithms have been proposed in the literature and this paper surveys some of the recent work that has been performed in this area. We examine some of the most relevant tree mining algorithms and compare them in order to highlight their similarities and differences.

    References

    [1]
    A. Kenji, K. Shinji, A. Tatsuya, A. Hiroki and A. Setsuo, Efficient substructure discovery from large semi-structured data. In Proceedings of the 2nd SIAM International Conference on Data Mining, 2002.
    [2]
    C. Charu, Aggarwal, Na Ta, J.Y. Wang, J.H. Feng and M.J. Zaki, Xproj: a framework for projected structural clustering of XML documents. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 12-15, 2007, pp. 46-55.
    [3]
    R. Agrawal and R. Srikant, Fast algorithms for mining association rules in large databases. In Proceedings of 20th International Conference on Very Large Data Bases, September 12-15, 1994, pp. 487-499.
    [4]
    T. Asai, H. Arimura, T. Uno and S. ichi Nakano, Discovering frequent substructures in large unordered trees. In Discovery Science, volume 2843 of Lecture Notes in Artificial Intelligence, Springer, 2003, pp. 47-61.
    [5]
    C. Borgelt and M.R. Berthold, Mining molecular fragments: Finding relevant substructures of molecules. In Proceedings of the 2nd IEEE International Conference on Data Mining, 2002, pp. 51-59.
    [6]
    B. Bringmann, Matching in frequent tree discovery. In Proceedings of the 4th IEEE International Conference on Data Mining, 1-4 November, Brighton, UK, 2004, pp. 335-338.
    [7]
    B. Bringmann, To see the wood for the trees: Mining frequent tree patterns. In Constraint-Based Mining and Inductive Databases, European Workshop on Inductive Databases and Constraint Based Mining, Hinterzarten, Germany, March 11-13, 2004, Revised Selected Papers, volume 3848 of Lecture Notes in Computer Science, Springer, 2006, pp. 38-63.
    [8]
    Y. Chi, R.R. Muntz, S. Nijssen and J.N. Kok, Frequent subtree mining - an overview, Fundamenta Informaticae 66(1-2) (2005), 161-198.
    [9]
    Y. Chi, Y. Xia, Y. Yang and R.R. Muntz, Mining closed and maximal frequent subtrees from databases of labeled rooted trees, IEEE Transactions on Knowledge and Data Engineering 17(2) (2005), 190-202.
    [10]
    Y. Chi, Y. Yang and R.R. Muntz, Indexing and mining free trees. In Proceedings of the 3rd IEEE International Conference on Data Mining, 2003, pp. 509-512.
    [11]
    Y. Chi, Y. Yang and R.R. Muntz, Hybrid Tree Miner: An efficient algorithm for mining frequent rooted trees and free trees using canonical form. In The 16th International Conference on Scientific and Statistical Database Management, 2004, pp. 11-20.
    [12]
    J. Dieter, In Graphs, Networks and Algorithms, volume 5 of Algorithms and Computation in Mathematics. Springer, 2007.
    [13]
    F. Hadzic, H. Tan and T.S. Dillon, Uni3 - efficient algorithm for mining unordered induced subtrees using tmg candidate generation. In Computational Intelligence and Data Mining, 2007, pp. 568-575.
    [14]
    J. Han, J. Pei and Y. Yin, Mining frequent patterns without candidate generation. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2000, pp. 1-12.
    [15]
    S. Hido and H. Kawano, AMIOT: induced ordered tree mining in tree-structured databases. In Proceedings of the 5th IEEE International Conference on Data Mining, 2005, pp. 170-177.
    [16]
    A. Jimenez, F. Berzal and J. Carlos Cubero, Mining induced and embedded subtrees in ordered, unordered, and partially-ordered trees. In the 17th International Symposium on Methodologies for Intelligent Systems, volume 4994 of Lecture Notes in Artificial Intelligence, Springer, 2008, pp. 111-120.
    [17]
    J.M. Kleinberg, Challenges in mining social network data: processes, privacy, and paradoxes. In Proceedings of the 13th ACMSIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, California, USA, August 12-15, 2007, 2007, pp. 4-5.
    [18]
    R. Kosala and H. Blockeel, Web mining research: A survey, SIGKDD Explorations 2(1) (2000), 1-15.
    [19]
    F. Luccio, A. Mesa Enriquez, P. Olivares Rieumont and L. Pagli, Exact rooted subtree matching in sublinear time. In ANaCC/ACM/IEEE International Congress on Computer Science, CENIDET, 2002, pp. 27-35.
    [20]
    R. Nayak and M. Javeed Zaki, eds, Proceedings of the First International Workshop of Knowledge Discovery from XML Documents, Singapore, April 9, volume 3915 of Lecture Notes in Computer Science. Springer, 2006.
    [21]
    S. Nijssen and J.N. Kok. Efficient discovery of frequent unordered trees. In First International Workshop on Mining Graphs, Trees and Sequences (MGTS2003), in conjunction with ECML/PKDD'03, 2003, pp. 55-64.
    [22]
    S. Nijssen and J.N. Kok, A quickstart in frequent structure mining can make a difference. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2004, pp. 647-652.
    [23]
    Jian Pei, Jiawei Han, Bezhad Mortazavi-Asl, Helen Pinto, Qiming Chen, Umeshwar Dayal, and Mei-Chun Hsu. Prefixspan: Mining sequential patterns efficiently by prefix-projected pattern growth. In Proceedings of the 17th International Conference on Data Engineering, 2001, pp. 215-225.
    [24]
    B. Phoophakdee and M.J. Zaki, Genome-scale disk-based suffix tree indexing. In Proceedings of the 27th ACM SIGMOD International Conference on Management of Data, 2007, pp. 833-844.
    [25]
    U. Rückert and S. Kramer, Frequent free tree discovery in graph data. In Proceedings of the 2004 ACM symposium on Applied computing, 2004, pp. 564-570.
    [26]
    H. Tan, T.S. Dillon, L. Feng, E. Chang and F. Hadzic, X3-Miner: Mining Patterns from an XML Database. In The 6th International Conference on Data Mining, Text Mining and their Business Applications, May 2005, Skiathos, Greece, 2005, pp. 287-296.
    [27]
    H. Tan, T.S. Dillon, F. Hadzic, E. Chang and L. Feng, MB3-Miner: mining eMBedded subTREEs using Tree Model Guided candidate generation. In Proceedings of the First International Workshop on Mining Complex Data, 2005, pp. 103-110.
    [28]
    H. Tan, T.S. Dillon, F. Hadzic, E. Chang and L. Feng, IMB3-Miner: Mining induced/embedded subtrees by constraining the level of embedding. In Proceedings of the 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2006, pp. 450-461.
    [29]
    Shirish Tatikonda, Srinivasan Parthasarathy, and Tahsin Kurc. TRIPS and TIDES: new algorithms for tree mining. In Proceedings of the 15th ACM International Conference on Information and Knowledge Management, 2006, pp. 455-464.
    [30]
    A. Termier, M.-C. Rousset and M. Sebag, TreeFinder: a first step towards XML data mining. In Proceedings of the 2nd IEEE International Conference on Data Mining, 2002, pp. 450-457.
    [31]
    Alexandre Termier, Marie-Christine Rousset, and Michele Sebag. DRYADE: a new approach for discovering closed frequent trees in heterogeneous tree databases. In Proceedings of the 4th IEEE International Conference on Data Mining, 2004, pp. 543-546.
    [32]
    A. Termier, M.-C. Rousset, M. Sebag, K. Ohara, T. Washio and H. Motoda, Efficient mining of high branching factor attribute trees. In Proceedings of the 5th IEEE International Conference on Data Mining, 2005, pp. 785-788.
    [33]
    Y. Tian, S. Tata, R.A. Hankins and J.M. Patel, Practical methods for constructing suffix trees. In Proceedings of the 31st International Conference on Very Large Data Bases, volume 14, 2005, pp. 281-299.
    [34]
    C. Wang, M. Hong, J. Pei, H. Zhou, W. Wang and B. Shi, Efficient patterngrowth methods for frequent tree pattern mining. In Proceedings of the 8th Pacific-Asia Conference on Knowledge Discovery and Data Mining, volume 3056 of Lecture Notes in Computer Science, Springer, 2004, pp. 441-451.
    [35]
    Y. Xiao, J.-F. Yao, Z. Li and M.H. Dunham, Efficient data mining for maximal frequent subtrees. In Proceedings of the 3rd IEEE International Conference on Data Mining, 2003, pp. 379-386.
    [36]
    X. Yan and J. Han, gSpan: Graph-based substructure pattern mining. In Proceedings of the 2nd IEEE International Conference on Data Mining, pp. 721-724, 2002.
    [37]
    M.J. Zaki, Efficiently mining frequent embedded unordered trees, Fundamenta Informaticae 66(1-2) (2005), 33-52.
    [38]
    M.J. Zaki, Efficiently mining frequent trees in a forest: Algorithms and applications, IEEE Transactions on Knowledge and Data Engineering 17(8) (2005), 1021-1035.
    [39]
    M.J. Zaki and C.C. Aggarwal, XRules: an effective structural classifier for XML data. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (2003), pp. 316-325.
    [40]
    M.J. Zaki, Efficiently mining frequent trees in a forest. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, July 23-26, 2002, pp. 71-80.
    [41]
    S. Zhang and J.T.L. Wang, Discovering frequent agreement subtrees from phylogenetic data, IEEE Transactions on Knowledge and Data Engineering 20(1) (2008), 68-82.

    Cited By

    View all
    • (2018)Mining Abstract XML Data-TypesACM Transactions on the Web10.1145/326746713:1(1-37)Online publication date: 4-Dec-2018
    • (2014)Mining idioms from source codeProceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering10.1145/2635868.2635901(472-483)Online publication date: 11-Nov-2014
    • (2011)How to use "classical" tree mining algorithms to find complex spatio-temporal patterns?Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part II10.5555/2033546.2033558(107-117)Online publication date: 29-Aug-2011
    • Show More Cited By

    Index Terms

    1. Frequent tree pattern mining: A survey
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Information & Contributors

            Information

            Published In

            cover image Intelligent Data Analysis
            Intelligent Data Analysis  Volume 14, Issue 6
            November 2010
            199 pages

            Publisher

            IOS Press

            Netherlands

            Publication History

            Published: 15 November 2010

            Author Tags

            1. Data mining
            2. frequent patterns
            3. tree patterns

            Qualifiers

            • Article

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • Downloads (Last 12 months)0
            • Downloads (Last 6 weeks)0
            Reflects downloads up to 11 Aug 2024

            Other Metrics

            Citations

            Cited By

            View all
            • (2018)Mining Abstract XML Data-TypesACM Transactions on the Web10.1145/326746713:1(1-37)Online publication date: 4-Dec-2018
            • (2014)Mining idioms from source codeProceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering10.1145/2635868.2635901(472-483)Online publication date: 11-Nov-2014
            • (2011)How to use "classical" tree mining algorithms to find complex spatio-temporal patterns?Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part II10.5555/2033546.2033558(107-117)Online publication date: 29-Aug-2011
            • (2011)Mining patterns from longitudinal studiesProceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part II10.1007/978-3-642-25856-5_13(166-179)Online publication date: 17-Dec-2011

            View Options

            View options

            Get Access

            Login options

            Media

            Figures

            Other

            Tables

            Share

            Share

            Share this Publication link

            Share on social media