Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Efficiently Mining Frequent Embedded Unordered Trees

Published: 01 November 2004 Publication History
  • Get Citation Alerts
  • Abstract

    Mining frequent trees is very useful in domains like bioinformatics, web mining, mining semi-structured data, and so on. In this paper we introduce SLEUTH, an efficient algorithm for mining frequent, unordered, embedded subtrees in a database of labeled trees. The key contributions of our work are as follows: We give the first algorithm that enumerates all embedded, unordered trees. We propose a new equivalence class extension scheme to generate all candidate trees. We extend the notion of scope-list joins to compute frequency of unordered trees. We conduct performance evaluation on several synthetic and real datasets to show that SLEUTH is an efficient algorithm, which has performance comparable to TreeMiner, that mines only ordered trees.

    References

    [1]
    {1} Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A. I.: Fast Discovery of Association Rules, Advances in Knowledge Discovery and Data Mining (U. Fayyad, et al, Eds.), AAAI Press, Menlo Park, CA, 1996.
    [2]
    {2} Agrawal, R., Srikant, R.: Mining Sequential Patterns, 11th Intl. Conf. on Data Engg., 1995.
    [3]
    {3} Asai, T., Abe, K., Kawasoe, S., Arimura, H., Satamoto, H., Arikawa, S.: Efficient Substructure Discovery from Large Semi-structured Data, 2nd SIAM Int'l Conference on Data Mining, April 2002.
    [4]
    {4} Asai, T., Arimura, H., Uno, T., Nakano, S.: Discovering Frequent Substructures in Large Unordered Trees, 6th Int'l Conf. on Discovery Science, October 2003.
    [5]
    {5} Chi, Y., Yang, Y., Muntz, R. R.: Indexing and Mining Free Trees, 3rd IEEE International Conference on Data Mining, 2003.
    [6]
    {6} Chi, Y., Yang, Y., Muntz, R. R.: HybridTreeMiner: An Efficient Algorihtm for Mining Frequent Rooted Trees and Free Trees Using Canonical Forms, 16th International Conference on Scientific and Statistical Database Management, 2004.
    [7]
    {7} Chi, Y., Yang, Y., Xia, Y., Muntz, R. R.: CMTreeMiner: Mining Both Closed and Maximal Frequent Subtrees, 8th Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2004.
    [8]
    {8} Cole, R., Hariharan, R., Indyk, P.: Tree pattern matching and subset matching in deterministic O(n log3 n)- time, 10th Symposium on Discrete Algorithms, 1999.
    [9]
    {9} Cook, D., Holder, L.: Substructure discovery using minimal description length and background knowledge, Journal of Artificial Intelligence Research, 1, 1994, 231-255.
    [10]
    {10} Dehaspe, L., Toivonen, H., King, R.: Finding frequent substructures in chemical compounds, 4th Intl. Conf. Knowledge Discovery and Data Mining, August 1998.
    [11]
    {11} Huan, J., Wang, W., Prins, J.: Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism, IEEE Int'l Conf. on Data Mining, 2003.
    [12]
    {12} Inokuchi, A., Washio, T., Motoda, H.: An apriori-based algorithm for mining frequent substructures from graph data, 4th European Conference on Principles of Knowledge Discovery and Data Mining, September 2000.
    [13]
    {13} Kilpelainen, P., Mannila, H.: Ordered and unordered tree inclusion, SIAM J. of Computing, 24(2), 1995, 340-356.
    [14]
    {14} Kramer, S., Raedt, L. D., Helma, C.: Molecular Feature Mining in HIV data, Int'l Conf. on Knowledge Discovery and Data Mining, 2001.
    [15]
    {15} Kuramochi, M., Karypis, G.: Frequent Subgraph Discovery, 1st IEEE Int'l Conf. on Data Mining, November 2001.
    [16]
    {16} Morell, V.: Web-Crawling up the Tree of Life, Science, 273(5275), aug 1996, 568-570.
    [17]
    {17} Nijssen, S., Kok, J. N.: Efficient Discovery of Frequent Unordered Trees, 1st Int'l Workshop on Mining Graphs, Trees and Sequences, 2003.
    [18]
    {18} Shamir, R., Tsur, D.: Faster Subtree Isomorphism, Journal of Algorithms, 33, 1999, 267-280.
    [19]
    {19} Termier, A., Rousset, M.-C., Sebag, M.: TreeFinder: a First Step towards XML Data Mining, IEEE Int'l Conf. on Data Mining, 2002.
    [20]
    {20} Wang, K., Liu, H.: Discovering Typical Structures of Documents: A Road Map Approach, ACM SIGIR Conference on Information Retrieval, 1998.
    [21]
    {21} Xiao, Y., Yao, J.-F., Li, Z., Dunham, M. H.: Efficient Data Mining for Maximal Frequent Subtrees, International Conference on Data Mining, 2003.
    [22]
    {22} Yan, X., Han, J.: gSpan: Graph-based substructure pattern mining, IEEE Int'l Conf. on Data Mining, 2002.
    [23]
    {23} Yan, X., Han, J.: CloseGraph: Mining Closed Frequent Graph Patterns, ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, August 2003.
    [24]
    {24} Yoshida, K., Motoda, H.: CLIP: Concept Learning from Inference Patterns, Artificial Intelligence, 75(1), 1995, 63-92.
    [25]
    {25} Zaki, M. J.: Efficiently Mining Frequent Trees in a Forest, 8th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, July 2002.
    [26]
    {26} Zaki, M. J., Aggarwal, C.: Xrules: An effective structural classifier for XML data, 9th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, August 2003.

    Cited By

    View all

    Index Terms

    1. Efficiently Mining Frequent Embedded Unordered Trees
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image Fundamenta Informaticae
        Fundamenta Informaticae  Volume 66, Issue 1-2
        Advances in Mining Graphs, Trees and Sequences
        November 2004
        201 pages

        Publisher

        IOS Press

        Netherlands

        Publication History

        Published: 01 November 2004

        Author Tags

        1. Embedded Trees
        2. Tree Mining
        3. Unordered Trees

        Qualifiers

        • Article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0

        Other Metrics

        Citations

        Cited By

        View all
        • (2021)PrefixFPM: a parallel framework for general-purpose mining of frequent and closed patternsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-021-00687-031:2(253-286)Online publication date: 9-Aug-2021
        • (2018)A MapReduce-Based Approach for Mining Embedded Patterns from Large Tree DataWeb and Big Data10.1007/978-3-319-96893-3_34(455-462)Online publication date: 23-Jul-2018
        • (2018)Efficient Discovery of Embedded Patterns from Large Attributed TreesDatabase Systems for Advanced Applications10.1007/978-3-319-91458-9_34(558-576)Online publication date: 21-May-2018
        • (2015)Process mining for knowledge-intensive business processesProceedings of the 15th International Conference on Knowledge Technologies and Data-driven Business10.1145/2809563.2809580(1-8)Online publication date: 21-Oct-2015
        • (2014)EvoMinerKnowledge and Information Systems10.5555/2687513.268757441:3(559-590)Online publication date: 1-Dec-2014
        • (2012)Mining of closed frequent subtrees from frequently updated databasesIntelligent Data Analysis10.5555/2595532.259554016:6(953-967)Online publication date: 1-Nov-2012
        • (2012)Using trees to mine multirelational databasesData Mining and Knowledge Discovery10.1007/s10618-011-0218-x24:1(1-39)Online publication date: 1-Jan-2012
        • (2011)How to use "classical" tree mining algorithms to find complex spatio-temporal patterns?Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part II10.5555/2033546.2033558(107-117)Online publication date: 29-Aug-2011
        • (2011)Mining patterns from longitudinal studiesProceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part II10.1007/978-3-642-25856-5_13(166-179)Online publication date: 17-Dec-2011
        • (2010)Information extraction using XPathProceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part III10.5555/1885450.1885466(104-112)Online publication date: 8-Sep-2010
        • Show More Cited By

        View Options

        View options

        Get Access

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media