Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2939672.2939856acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Public Access

Scalable Pattern Matching over Compressed Graphs via Dedensification

Published: 13 August 2016 Publication History

Abstract

One of the most common operations on graph databases is graph pattern matching (e.g., graph isomorphism and more general types of "subgraph pattern matching"). In fact, in some graph query languages every single query is expressed as a graph matching operation. Consequently, there has been a significant amount of research effort in optimizing graph matching operations in graph database systems. As graph databases have scaled in recent years, so too has recent work on scaling graph matching operations. However, the performance of recent proposals for scaling graph pattern matching is limited by the presence of high-degree nodes. These high-degree nodes result in an explosion of intermediate result sizes during query execution, and therefore significant performance bottlenecks. In this paper we present a dedensification technique that losslessly compresses the neighborhood around high-degree nodes. Furthermore, we introduce a query processing technique that enables direct operation of graph query processing operations over the compressed data, without ever having to decompress the data. For pattern matching operations, we show how this technique can be implemented as a layer above existing graph database systems, so that the end-user can benefit from this technique without requiring modifications to the core graph database engine code. Our technique reduces the size of the intermediate result sets during query processing, and thereby improves query performance.

References

[1]
A.-L. Barabási and R. Albert, "Emergence of scaling in random networks," Science, vol. 286, no. 5439, 1999.
[2]
J. Leskovec, J. M. Kleinberg, and C. Faloutsos, "Graph evolution: Densification and shrinking diameters," TKDD, vol. 1, no. 1, 2007.
[3]
R. Meusel, S. Vigna, O. Lehmberg, and C. Bizer, "Graph structure in the web - revisited: a trick of the heavy tail," in WWW, 2014.
[4]
C. Weiss, P. Karras, and A. Bernstein, "Hexastore: sextuple indexing for semantic web data management," PVLDB, vol. 1, no. 1, 2008.
[5]
T. Neumann and G. Weikum, "The RDF-3X engine for scalable management of RDF data," VLDB J., vol. 19, no. 1, pp. 91--113, 2010.
[6]
W. Fan, J. Li, X. Wang, and Y. Wu, "Query preserving graph compression," in SIGMOD, 2012, pp. 157--168.
[7]
W. Fan, X. Wang, and Y. Wu, "Querying big graphs within bounded resources," in SIGMOD, 2014, pp. 301--312.
[8]
V. Satuluri, S. Parthasarathy, and Y. Ruan, "Local graph sparsification for scalable clustering," in SIGMOD, 2011.
[9]
D. A. Spielman and N. Srivastava, "Graph sparsification by effective resistances," SIAM J. Comput., vol. 40, no. 6, 2011.
[10]
W. Fan, F. Geerts, Y. Cao, T. Deng, and P. Lu, "Querying big data by accessing small data," in PODS, 2015.
[11]
G. Buehrer and K. Chellapilla, "A scalable pattern mining approach to web graph compression with communities," in WSDM, 2008.
[12]
D. J. Abadi, A. Marcus, S. Madden, and K. Hollenbach, "SW-Store: a vertically partitioned DBMS for semantic web data management," VLDB J., vol. 18, no. 2, 2009.
[13]
A. Gubichev and T. Neumann, "Exploiting the query structure for efficient join ordering in SPARQL queries," in EDBT, 2014.
[14]
L. Zou, M. T. Özsu, L. Chen, X. Shen, R. Huang, and D. Zhao, "gStore: a graph-based SPARQL query engine," VLDB J., vol. 23, no. 4, 2014.
[15]
J. Leskovec and A. Krevl, "SNAP Datasets: Stanford large network dataset collection," http://snap.stanford.edu/data.
[16]
P. Boldi and S. Vigna, "The webgraph framework I: compression techniques," in WWW, 2004, pp. 595--602.
[17]
F. Chierichetti, R. Kumar, S. Lattanzi, M. Mitzenmacher, A. Panconesi, and P. Raghavan, "On compressing social networks," in KDD, 2009.
[18]
Y. Lim, U. Kang, and C. Faloutsos, "SlashBurn: Graph compression and mining beyond caveman communities," TKDE, vol. 26, no. 12, pp. 3077--3089, 2014.
[19]
X. Yan, P. S. Yu, and J. Han, "Graph indexing: A frequent structure-based approach," in SIGMOD, 2004, pp. 335--346.
[20]
B. Shao, H. Wang, and Y. Li, "Trinity: a distributed graph engine on a memory cloud," in SIGMOD, 2013.
[21]
J. Cheng, J. X. Yu, B. Ding, P. S. Yu, and H. Wang, "Fast graph pattern matching," in ICDE, 2008, pp. 913--922.

Cited By

View all
  • (2024)Online Detection of Anomalies in Temporal Knowledge Graphs with InterpretabilityProceedings of the ACM on Management of Data10.1145/36988232:6(1-26)Online publication date: 20-Dec-2024
  • (2024)Graph Summarization: Compactness Meets EfficiencyProceedings of the ACM on Management of Data10.1145/36549432:3(1-26)Online publication date: 30-May-2024
  • (2024)Quantivine: A Visualization Approach for Large-Scale Quantum Circuit Representation and AnalysisIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.332714830:1(573-583)Online publication date: 1-Jan-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
August 2016
2176 pages
ISBN:9781450342322
DOI:10.1145/2939672
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 August 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. big graphs
  2. graph databases
  3. graph pattern matching
  4. power-law
  5. rdf
  6. scale-free graphs
  7. sparql

Qualifiers

  • Research-article

Funding Sources

Conference

KDD '16
Sponsor:

Acceptance Rates

KDD '16 Paper Acceptance Rate 66 of 1,115 submissions, 6%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)144
  • Downloads (Last 6 weeks)22
Reflects downloads up to 16 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Online Detection of Anomalies in Temporal Knowledge Graphs with InterpretabilityProceedings of the ACM on Management of Data10.1145/36988232:6(1-26)Online publication date: 20-Dec-2024
  • (2024)Graph Summarization: Compactness Meets EfficiencyProceedings of the ACM on Management of Data10.1145/36549432:3(1-26)Online publication date: 30-May-2024
  • (2024)Quantivine: A Visualization Approach for Large-Scale Quantum Circuit Representation and AnalysisIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.332714830:1(573-583)Online publication date: 1-Jan-2024
  • (2024)FRESH: Towards Efficient Graph Queries in an Outsourced Graph2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00346(4545-4557)Online publication date: 13-May-2024
  • (2023)DG_summ: A schema-driven approach for personalized summarizing heterogeneous data graphsComputer Science and Information Systems10.2298/CSIS230331062B20:4(1591-1638)Online publication date: 2023
  • (2023)DomainNet: Homograph Detection and Understanding in Data Lake DisambiguationACM Transactions on Database Systems10.1145/361291948:3(1-40)Online publication date: 12-Sep-2023
  • (2023)An Optimized Lossless Graph Summarization for Large-Scale Graphs2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS60453.2023.00061(355-362)Online publication date: 17-Dec-2023
  • (2023)Efficient and Effective Path Compression in Large Graphs2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00237(3093-3105)Online publication date: Apr-2023
  • (2023)Efficient and Compact Spreadsheet Formula Graphs2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00202(2634-2646)Online publication date: Apr-2023
  • (2023)Fair Group Summarization with Graph Patterns2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00154(1982-1994)Online publication date: Apr-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media