Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

On graph query optimization in large networks

Published: 01 September 2010 Publication History

Abstract

The dramatic proliferation of sophisticated networks has resulted in a growing need for supporting effective querying and mining methods over such large-scale graph-structured data. At the core of many advanced network operations lies a common and critical graph query primitive: how to search graph structures efficiently within a large network? Unfortunately, the graph query is hard due to the NP-complete nature of subgraph isomorphism. It becomes even challenging when the network examined is large and diverse. In this paper, we present a high performance graph indexing mechanism, SPath, to address the graph query problem on large networks. SPath leverages decomposed shortest paths around vertex neighborhood as basic indexing units, which prove to be both effective in graph search space pruning and highly scalable in index construction and deployment. Via SPath, a graph query is processed and optimized beyond the traditional vertex-at-a-time fashion to a more efficient path-at-a-time way: the query is first decomposed to a set of shortest paths, among which a subset of candidates with good selectivity is picked by a query plan optimizer; Candidate paths are further joined together to help recover the query graph to finalize the graph query processing. We evaluate SPath with the state-of-the-art GraphQL on both real and synthetic data sets. Our experimental studies demonstrate the effectiveness and scalability of SPath, which proves to be a more practical and efficient indexing method in addressing graph queries on large networks.

References

[1]
Oracle Database 10g: Oracle Spatial Network Data Model. Oracle Technical White Paper, 2005.
[2]
M. Bröcheler, A. Pugliese, and V. S. Subrahmanian. DOGMA: A disk-oriented graph matching algorithm for RDF databases. In Proceedings of ISWC '09, pages 97--113, 2009.
[3]
D. Chakrabarti, Y. Zhan, and C. Faloutsos. R-MAT: A recursive model for graph mining. In Proceedings of SDM '04, 2004.
[4]
J. Cheng, Y. Ke, W. Ng, and A. Lu. FG-index: towards verification-free query processing on graph databases. In Proceedings of SIGMOD'07, pages 857--872, 2007.
[5]
D. J. Cook and L. B. Holder. Mining Graph Data. John Wiley & Sons, 2006.
[6]
T. H. Cormen, C. Stein, R. L. Rivest, and C. E. Leiserson. Introduction to Algorithms. McGraw-Hill Higher Education, 2001.
[7]
F. Eichinger, K. Böhm, and M. Huber. Mining edge-weighted call graphs to localise software bugs. In Proceedings of ECML PKDD'08, pages 333--348, 2008.
[8]
S. A. et al. Predicting protein complex membership using probabilistic network reliability. Genome Research, 2004.
[9]
B. Gallagher. Matching structure and semantics: A survey on graph-based pattern matching. In Proceedings of AAAI FS'06, pages 45--53, 2006.
[10]
M. R. Garey and D. S. Johnson. Computers and Intractability; A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., 1990.
[11]
H. He and A. K. Singh. Closure-tree: An index structure for graph queries. In Proceedings of ICDE'06, page 38, 2006.
[12]
H. He and A. K. Singh. Graphs-at-a-time: query language and access methods for graph databases. In Proceedings of SIGMOD'08, pages 405--418, 2008.
[13]
H. He, H. Wang, J. Yang, and P. S. Yu. BLINKS: ranked keyword searches on graphs. In Proceedings of SIGMOD '07, pages 305--316, 2007.
[14]
A. Hulgeri and C. Nakhe. Keyword searching and browsing in databases using BANKS. In Proceedings of ICDE'02, page 431, 2002.
[15]
R. Jin, Y. Xiang, N. Ruan, and D. Fuhry. 3-hop: a high-compression indexing scheme for reachability query. In Proceedings of SIGMOD'09, pages 813--826, 2009.
[16]
R. Jin, Y. Xiang, N. Ruan, and H. Wang. Efficiently answering reachability queries on very large directed graphs. In Proceedings of SIGMOD'08, pages 595--608, 2008.
[17]
V. Kacholia, S. Pandit, S. Chakrabarti, S. Sudarshan, R. Desai, and H. Karambelkar. Bidirectional expansion for keyword search on graph databases. In Proceedings of VLDB '05, pages 505--516, 2005.
[18]
B. Mckay. Practical graph isomorphism, 1981. http://cs.anu.edu.au/~bdm/nauty/.
[19]
M. A. Nascimento, J. Sander, and J. Pound. Analysis of SIGMOD's co-authorship graph. SIGMOD Rec., 32(3):8--10, 2003.
[20]
L. P. Cordella, P. Foggia, C. Sansone, and M. Vento. A (sub)graph isomorphism algorithm for matching large graphs. IEEE Trans. Pattern Anal. Mach. Intell., 26(10):1367--1372, 2004.
[21]
J. W. Raymond and P. Willett. Maximum common subgraph isomorphism algorithms for the matching of chemical structures. Journal of Computer-Aided Molecular Design, 16(7):521--533, 2002.
[22]
R. Shamir and D. Tsur. Faster subtree isomorphism. In Proceedings of ISTCS '97, page 126, 1997.
[23]
H. Shang, Y. Zhang, X. Lin, and J. X. Yu. Taming verification hardness: an efficient algorithm for testing subgraph isomorphism. Proc. VLDB Endow., 1(1):364--375, 2008.
[24]
D. Shasha, J. T. L. Wang, and R. Giugno. Algorithmics and applications of tree and graph searching. In Proceedings of PODS'02, pages 39--52, 2002.
[25]
Y. Tian, R. C. McEachin, C. Santos, D. J. States, and J. M. Patel. SAGA: a subgraph matching tool for biological graphs. Bioinformatics, 23(2):232--239, 2007.
[26]
S. Trissl and U. Leser. Fast and practical indexing and querying of very large graphs. In Proceedings of SIGMOD '07, pages 845--856, 2007.
[27]
J. R. Ullmann. An algorithm for subgraph isomorphism. J. ACM, 23(1):31--42, 1976.
[28]
X. Yan, P. S. Yu, and J. Han. Graph indexing: a frequent structure-based approach. In Proceedings of SIGMOD'04, pages 335--346, 2004.
[29]
S. Zhang, M. Hu, and J. Yang. TreePi: A novel graph indexing method. In Proceedings of ICDE'07, pages 966--975, 2007.
[30]
S. Zhang, S. Li, and J. Yang. GADDI: distance index based subgraph matching in biological networks. In Proceedings of EDBT'09, pages 192--203, 2009.
[31]
P. Zhao, J. X. Yu, and P. S. Yu. Graph indexing: tree + delta ≥ graph. In Proceedings of VLDB'07, pages 938--949, 2007.
[32]
L. Zou, L. Chen, and M. T. Özsu. Distance-join: Pattern match query in a large graph database. PVLDB, 2(1):886--897, 2009.

Cited By

View all
  • (2024)TC-Match: Fast Time-Constrained Continuous Subgraph MatchingProceedings of the VLDB Endowment10.14778/3681954.368196317:11(2791-2804)Online publication date: 30-Aug-2024
  • (2024)Fast Local Subgraph CountingProceedings of the VLDB Endowment10.14778/3659437.365945117:8(1967-1980)Online publication date: 1-Apr-2024
  • (2024)Understanding High-Performance Subgraph Pattern Matching: A Systems PerspectiveProceedings of the 7th Joint Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA)10.1145/3661304.3661897(1-12)Online publication date: 14-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 3, Issue 1-2
September 2010
1658 pages

Publisher

VLDB Endowment

Publication History

Published: 01 September 2010
Published in PVLDB Volume 3, Issue 1-2

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)137
  • Downloads (Last 6 weeks)14
Reflects downloads up to 23 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)TC-Match: Fast Time-Constrained Continuous Subgraph MatchingProceedings of the VLDB Endowment10.14778/3681954.368196317:11(2791-2804)Online publication date: 30-Aug-2024
  • (2024)Fast Local Subgraph CountingProceedings of the VLDB Endowment10.14778/3659437.365945117:8(1967-1980)Online publication date: 1-Apr-2024
  • (2024)Understanding High-Performance Subgraph Pattern Matching: A Systems PerspectiveProceedings of the 7th Joint Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA)10.1145/3661304.3661897(1-12)Online publication date: 14-Jun-2024
  • (2024)CAVE: Concurrency-Aware Graph Processing on SSDsProceedings of the ACM on Management of Data10.1145/36549282:3(1-26)Online publication date: 30-May-2024
  • (2024)gSWORD: GPU-accelerated Sampling for Subgraph CountingProceedings of the ACM on Management of Data10.1145/36392882:1(1-26)Online publication date: 26-Mar-2024
  • (2024)VeNoM: Approximate Subgraph Matching with Enhanced Neighbourhood Structural InformationProceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)10.1145/3632410.3632459(18-26)Online publication date: 4-Jan-2024
  • (2024)Path-based approximate matching of fuzzy spatiotemporal RDF dataWorld Wide Web10.1007/s11280-024-01247-627:2Online publication date: 3-Feb-2024
  • (2024)Subgraph matching-based reference placement for printed circuit board designsThe Journal of Supercomputing10.1007/s11227-024-06338-980:16(24324-24357)Online publication date: 1-Nov-2024
  • (2024)Scalable top-k query on information networks with hierarchical inheritance relationsDistributed and Parallel Databases10.1007/s10619-023-07432-242:1(1-30)Online publication date: 1-Mar-2024
  • (2024)Optimizing subgraph retrieval and matching with an efficient indexing schemeKnowledge and Information Systems10.1007/s10115-024-02175-766:11(6815-6843)Online publication date: 1-Nov-2024
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media