Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1007568.1007607acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

Graph indexing: a frequent structure-based approach

Published: 13 June 2004 Publication History
  • Get Citation Alerts
  • Abstract

    Graph has become increasingly important in modelling complicated structures and schemaless data such as proteins, chemical compounds, and XML documents. Given a graph query, it is desirable to retrieve graphs quickly from a large database via graph-based indices. In this paper, we investigate the issues of indexing graphs and propose a novel solution by applying a graph mining technique. Different from the existing path-based methods, our approach, called gIndex, makes use of frequent substructure as the basic indexing feature. Frequent substructures are ideal candidates since they explore the intrinsic characteristics of the data and are relatively stable to database updates. To reduce the size of index structure, two techniques, size-increasing support constraint and discriminative fragments, are introduced. Our performance study shows that gIndex has 10 times smaller index size, but achieves 3--10 times better performance in comparison with a typical path-based method, GraphGrep. The gIndex approach not only provides and elegant solution to the graph indexing problem, but also demonstrates how database indexing and query processing can benefit form data mining, especially frequent pattern mining. Furthermore, the concepts developed here can be applied to indexing sequences, trees, and other complicated structures as well.

    References

    [1]
    S. Beretti, A. Del Bimbo, and E. Vicario. Efficient matching and indexing of graph models in content-based retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23:1089--1105, 2001.]]
    [2]
    C. Borgelt and M. R. Berthold. Mining molecular fragments: Finding relevant substructures of molecules. In Proc. 2002 Int. Conf. on Data Mining (ICDM'02), pages 211--218, Maebashi, Japan, Dec. 2002.]]
    [3]
    Q. Chen, A. Lim, and K. W. Ong. D(k)-index: An adaptive structural summary for graph-structured data. In Proc. 2003 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD '03), pages 134--144. San Diego, CA, June 2003.]]
    [4]
    C. Chung, J. Min, and K. Shim. Apex: An adaptive path index for xml data. In Proc. 2002 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD '02), pages 121--132, Madison, WI, June 2002.]]
    [5]
    B. Cooper, N. Sample, M. J. Franklin, G. R. Hjaltason, and M. Shadmon. A fast index for semistructured data. In Proc. 2001 Int. Conf. Very Large Data Bases (VLDB '01), pages 341--350, 2001.]]
    [6]
    R. Goldman and J. Widom. Dataguides: Enabling query formulation and optimization in semistructured databases. In Proc. 1997 Int. Conf. Very Large Data Bases (VLDB '97), pages 436--445, 1997.]]
    [7]
    A. Inokuchi, T. Washio, and H. Motoda. An apriori-based algorithm for mining frequent substructures from graph data. In Proc. 2000 European Symp. Principle of Data Mining and Knowledge Discovery (PKDD'00), pages 13--23, Lyon, France, Sept. 1998.]]
    [8]
    C. A. James, D. Weininger, and J. Delany. Daylight theory manual daylight version 4.82. Daylight Chemical Information Systems, Inc, 2003.]]
    [9]
    R. Kaushik P. Shenoy, P. Bohannon, and E. Gudes. Exploiting local similarity for efficient indexing of paths in graph structured data. In Proc. 2000 Int. Conf. Data Engineering ICDE'00), San Jose, CA, Feb. 2002.]]
    [10]
    M. Kuramochi and G. Karypis. Frequent subgraph discovery. In Proc. 2001 Int. Conf. Data Mining (ICDM'01), pages 313--320, San Jose, CA, Nov. 2001.]]
    [11]
    T. Madej, J. F. Gibrat, and S. H. Bryant. Threading a database of protein cores. Proteins, 3-2:289--306, 1995.]]
    [12]
    T. Milo and D. Suciu. Index structures for path expressions. Lecture Notes in Computer Science, 1540:277--295, 1999.]]
    [13]
    E. G. M. Petrakis and C. Faloutsos. Similarity searching in medical image databases. Knowledge and Data Engineering, 9(3):435--447, 1997.]]
    [14]
    D. Shasha, J. T-L Wang, and R. Guigno. Algorithmics and applications of tree and graph searching. In Proc. 21th ACM Symp. Principles of Database Systems (PODS'02), pages 39--52, Madison, WI, Jun. 2002.]]
    [15]
    A. Shokoufandeh, S. J. Dickinson, K. Siddiqi, and S. W. Zucker. Indexing using a spectral encoding of topological structure. In Proc. IEEE Int'l Conf Computer Vision and Pattern Recognition (CVPR'99), Fort Collins, CO, Jun. 1999.]]
    [16]
    S. Srinivasa and S. Kumar. A platform based on the multi-dimensional data model for analysis of bio-molecular structures. In Proc. 2003 Int. Conf. Very Large Data Bases (VLDB'03), 2003.]]
    [17]
    N. Vanetik, E. Gudes, and S. E. Shimony. Computing frequent graph patterns from semistructured data. In Proc. 2002 Int. Conf. on Data Mining (ICDM'02), pages 458--465, Maebashi, Japan, Dec, 2002.]]
    [18]
    T. Washio and H. Motoda. State of the art of graph-based data mining. SIGKDD Explorations, 5:59--68, 2003.]]
    [19]
    H. J. Wolfson and I. Rigoutsos. Geometric hashing: An introduction. IEEE Computational Science and Engineering, 4:10--21, 1997.]]
    [20]
    X. Yan and J. Han, gSpan: Graph-based substructure pattern mining. In Proc. 2002 Int. Conf. on Data Mining (ICDM'02), pages 721--724, Maebashi, Japan, Dec. 2002.]]
    [21]
    X. Yan and J. Han. CloseGraph: Mining closed frequent graph patterns. In Proc. 2003 Int. Conf. Knowledge Discovery and Data Mining (KDD'03), pages 286--295, Washington, D.C., Aug. 2003.]]
    [22]
    M. J. Zaki and K. Gouda. Fast vertical mining using diffsets. In Proc. 2003 Int. Conf. Knowledge Discovery and Data Mining (KDD'03), pages 326--335, Washington, D.C, Aug. 2003.]]

    Cited By

    View all
    • (2024)LESS: Low-Power Energy-Efficient Subgraph Isomorphism on FPGA2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546632(1-2)Online publication date: 25-Mar-2024
    • (2024)CAVE: Concurrency-Aware Graph Processing on SSDsProceedings of the ACM on Management of Data10.1145/36549282:3(1-26)Online publication date: 30-May-2024
    • (2024)A Comprehensive Survey and Experimental Study of Subgraph Matching: Trends, Unbiasedness, and InteractionProceedings of the ACM on Management of Data10.1145/36393152:1(1-29)Online publication date: 26-Mar-2024
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '04: Proceedings of the 2004 ACM SIGMOD international conference on Management of data
    June 2004
    988 pages
    ISBN:1581138598
    DOI:10.1145/1007568
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 June 2004

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Article

    Conference

    SIGMOD/PODS04
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)75
    • Downloads (Last 6 weeks)8
    Reflects downloads up to 10 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)LESS: Low-Power Energy-Efficient Subgraph Isomorphism on FPGA2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546632(1-2)Online publication date: 25-Mar-2024
    • (2024)CAVE: Concurrency-Aware Graph Processing on SSDsProceedings of the ACM on Management of Data10.1145/36549282:3(1-26)Online publication date: 30-May-2024
    • (2024)A Comprehensive Survey and Experimental Study of Subgraph Matching: Trends, Unbiasedness, and InteractionProceedings of the ACM on Management of Data10.1145/36393152:1(1-29)Online publication date: 26-Mar-2024
    • (2024)Neural Similarity Search on Supergraph ContainmentIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.327992036:1(281-295)Online publication date: Jan-2024
    • (2024)IVE: Accelerating Enumeration-Based Subgraph Matching via Exploring Isolated Vertices2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00321(4208-4221)Online publication date: 13-May-2024
    • (2024)Large Subgraph Matching: A Comprehensive and Efficient Approach for Heterogeneous Graphs2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00231(2972-2985)Online publication date: 13-May-2024
    • (2024)Authenticated Subgraph Matching in Hybrid-Storage Blockchains2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00159(1986-1998)Online publication date: 13-May-2024
    • (2024)An Experimental Evaluation of Summarisation-Based Frequent Subgraph Mining for Subgraph SearchingSN Computer Science10.1007/s42979-024-03006-w5:6Online publication date: 3-Jul-2024
    • (2024)Optimizing subgraph retrieval and matching with an efficient indexing schemeKnowledge and Information Systems10.1007/s10115-024-02175-7Online publication date: 16-Jul-2024
    • (2024)Minimum motif-cut: a workload-aware RDF graph partitioning strategyThe VLDB Journal10.1007/s00778-024-00860-1Online publication date: 8-Jul-2024
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media